Re: SPARK Storagelevel issues

2017-07-27 Thread 周康
testdf.persist(pyspark.storagelevel.StorageLevel.MEMORY_ONLY_SER) maybe StorageLevel should change.And check you config " spark.memory.storageFraction" which default value is 0.5 2017-07-28 3:04 GMT+08:00 Gourav Sengupta : > Hi, > > I cached in a table in a large EMR

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

2017-07-27 Thread Tathagata Das
For built-in SQL functions, it does not matter which language you use as the engine will use the most optimized JVM code to execute. However, in your case, you are asking for foreach in python. My interpretation was that you want to specify your python function that process the rows in python.

unsubscribe

2017-07-27 Thread Tao Lu
unsubscribe

回复:Re: A tool to generate simulation data

2017-07-27 Thread luohui20001
thank you Suzen, i've had a try to generate 1 billion records within 1.5min. It is fast.And I will go on to try some other cases. ThanksBest regards! San.Luo - 原始邮件 - 发件人:"Suzen, Mehmet" 收件人:luohui20...@sina.com 抄送人:user

SPARK Storagelevel issues

2017-07-27 Thread Gourav Sengupta
Hi, I cached in a table in a large EMR cluster and it has a size of 62 MB. Therefore I know the size of the table while cached. But when I am trying to cache in the table in smaller cluster which still has a total of 3 GB Driver memory and two executors with close to 2.5 GB memory the job still

Re: Spark2.1 installation issue

2017-07-27 Thread Vikash Kumar
Hi, I have posted to cloudera community also. But its spark2 installation, thought I might get some pointers here. Thank you On Thu, Jul 27, 2017, 11:29 PM Marcelo Vanzin wrote: > Hello, > > This is a CDH-specific issue, please use the Cloudera forums / support >

Re: Spark2.1 installation issue

2017-07-27 Thread Marcelo Vanzin
Hello, This is a CDH-specific issue, please use the Cloudera forums / support line instead of the Apache group. On Thu, Jul 27, 2017 at 10:54 AM, Vikash Kumar wrote: > I have installed spark2 parcel through cloudera CDH 12.0. I see some issue > there. Look like

Spark2.1 installation issue

2017-07-27 Thread Vikash Kumar
I have installed spark2 parcel through cloudera CDH 12.0. I see some issue there. Look like it didn't got configured properly. $ spark2-shell Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream at

Re: A tool to generate simulation data

2017-07-27 Thread Suzen, Mehmet
I suggest RandomRDDs API. It provides nice tools. If you write wrappers around that might be good. https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.random.RandomRDDs$ - To unsubscribe e-mail:

Re: running spark application compiled with 1.6 on spark 2.1 cluster

2017-07-27 Thread 周康
>From spark2.x the package of Logging is changed 2017-07-27 23:45 GMT+08:00 Marcelo Vanzin : > On Wed, Jul 26, 2017 at 10:45 PM, satishl wrote: > > is this a supported scenario - i.e., can I run app compiled with spark > 1.6 > > on a 2.+ spark

Re: running spark application compiled with 1.6 on spark 2.1 cluster

2017-07-27 Thread Marcelo Vanzin
On Wed, Jul 26, 2017 at 10:45 PM, satishl wrote: > is this a supported scenario - i.e., can I run app compiled with spark 1.6 > on a 2.+ spark cluster? In general, no. -- Marcelo - To unsubscribe

guava not compatible to hadoop version 2.6.5

2017-07-27 Thread Markus.Breuer
After upgrading from apache spark 2.1.1 to 2.2.0 our integration test fail with an exception: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.()V from class org.apache.hadoop.mapred.FileInputFormat at

How does Spark handle timestamps during Pandas dataframe conversion

2017-07-27 Thread saatvikshah1994
I've summarized this question in detail in this StackOverflow question with code snippets and logs: https://stackoverflow.com/questions/45308406/how-does-spark-handle-timestamp-types-during-pandas-dataframe-conversion/. Looking for efficient solutions to this? -- View this message in context:

Please unsubscribe

2017-07-27 Thread babita Sancheti
Sent from my iPhone > > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

A tool to generate simulation data

2017-07-27 Thread luohui20001
hello guys Is there a tool or an open source project that can mock lange amount of data quickly, and support below :1. transaction data2. time series data3. specified format data like CSV files or json files.4. data generated at a changing speed.5. distributed data generation

Re: Complex types projection handling with Spark 2 SQL and Parquet

2017-07-27 Thread Patrick
Hi , I am having the same issue. Has any one found solution to this. When i convert the nested JSON to parquet. I dont see the projection working correctly. It still reads all the nested structure columns. Parquet does support nested column projection. Does Spark 2 SQL provide the column