Re:RE: Going it alone.

2020-04-15 Thread jane thorpe
F*U*C*K O*F*F C*U*N*T*S On Thursday, 16 April 2020 Kelvin Qin wrote: No wonder I said why I can't understand what the mail expresses, it feels like a joke…… 在 2020-04-16 02:28:49,seemanto.ba...@nomura.com.INVALID 写道: Have we been tricked by a bot ?   From: Matt Smith

Re: [Pyspark] - Spark uses all available memory; unrelated to size of dataframe

2020-04-15 Thread jane thorpe
The Web UI only shows " The Storage Memory column shows the amount of memory used and reserved for caching data. " WEB UI  does not show  the values of Xmx or Xms or XSS. you are are never going to know the cause of OutofMemoryError or StackOverFlowError. The visual tool is as useless as it

Spark ORC store written timestamp as column

2020-04-15 Thread Manjunath Shetty H
Hi All, Is there anyway to store the exact written timestamp in the ORC file through spark ?. Use case something like `current_timestamp()` function in SQL. Generating in the program will not be equal to actual write time in ORC/hdfs file. Any suggestions will be helpful. Thanks Manjunath

Re:Question about how parquet files are read and processed

2020-04-15 Thread Kelvin Qin
Hi, The advantage of Parquet is that it only scans the required columns, it is a file in a column storage format. The fewer columns you select, the less memory is required. Developers do not need to care about the details of loading data, they are well-designed and imperceptible to users.

Re: [Pyspark] - Spark uses all available memory; unrelated to size of dataframe

2020-04-15 Thread Yeikel
The memory that you see in Spark's UI page, under storage is not the memory used by your processing but the amount of memory that you persisted from your RDDs and DataFrames Read more here : https://spark.apache.org/docs/3.0.0-preview/web-ui.html#storage-tab We need more details to be able to

Question about how parquet files are read and processed

2020-04-15 Thread Yeikel
I have a parquet file with millions of records and hundreds of fields that I will be extracting from a cluster with more resources. I need to take that data,derive a set of tables from only some of the fields and import them using a smaller cluster The smaller cluster cannot load in memory the

Re:[Structured Streaming] Checkpoint file compact file grows big

2020-04-15 Thread Kelvin Qin
SEE:http://spark.apache.org/docs/2.3.1/streaming-programming-guide.html#checkpointing "Note that checkpointing of RDDs incurs the cost of saving to reliable storage. This may cause an increase in the processing time of those batches where RDDs get checkpointed." As far as I know, the

Re:RE: Going it alone.

2020-04-15 Thread Kelvin Qin
No wonder I said why I can't understand what the mail expresses, it feels like a joke…… 在 2020-04-16 02:28:49,seemanto.ba...@nomura.com.INVALID 写道: Have we been tricked by a bot ? From: Matt Smith Sent: Wednesday, April 15, 2020 2:23 PM To: jane thorpe Cc:

[Structured Streaming] Checkpoint file compact file grows big

2020-04-15 Thread Ahn, Daniel
Are Spark Structured Streaming checkpoint files expected to grow over time indefinitely? Is there a recommended way to safely age-off old checkpoint data? Currently we have a Spark Structured Streaming process reading from Kafka and writing to an HDFS sink, with checkpointing enabled and

Save Spark dataframe as dynamic partitioned table in Hive

2020-04-15 Thread Mich Talebzadeh
Hi, I have an XML file that is read into Spark using Databa bricks jar file spark-xml_2.11-0.9.0.jar Doing some tests This is the format of XML (one row here) //* SKY 0123456789 123456789 XYZ GLX 12345678 */ OK I am trying to insert data into a hive partitioned table through

RE: Going it alone.

2020-04-15 Thread seemanto.barua
Have we been tricked by a bot ? From: Matt Smith Sent: Wednesday, April 15, 2020 2:23 PM To: jane thorpe Cc: dh.lo...@gmail.com; user@spark.apache.org; janethor...@aol.com; em...@yeikel.com Subject: Re: Going it alone. CAUTION EXTERNAL EMAIL: DO NOT CLICK ON LINKS OR OPEN ATTACHMENTS THAT ARE

Re: Going it alone.

2020-04-15 Thread Matt Smith
This is so entertaining. 1. Ask for help 2. Compare those you need help from to a lower order primate. 3. Claim you provided information you did not 4. Explain that providing any information would be "too revealing" 5. ??? Can't wait to hear what comes next, but please keep it up. This is a