Unusual bug,please help me,i can do nothing!!!

2022-03-30 Thread spark User
Hello, I am a spark user. I use the "spark-shell.cmd" startup command in windows cmd, the first startup is normal, when I use the "ctrl+c" command to force the end of the spark window, it can't start normally again. .The error message is as follows "F

error bug,please help me!!!

2022-03-20 Thread spark User
Hello, I am a spark user. I use the "spark-shell.cmd" startup command in windows cmd, the first startup is normal, when I use the "ctrl+c" command to force the end of the spark window, it can't start normally again. .The error message is as follows "F

Re: Driver hung and happend out of memory while writing to console progress bar

2017-02-13 Thread Spark User
> > wrote: > >> the spark version is 2.1.0 >> >> -- >> 发件人:方孝健(玄弟) <xiaojian@alibaba-inc.com> >> 发送时间:2017年2月10日(星期五) 12:35 >> 收件人:spark-dev <d...@spark.apache.org>; spark-user <user@spark.apache.org&g

Re: Question about best Spark tuning

2017-02-13 Thread Spark User
My take on the 2-3 tasks per CPU core is that you want to ensure you are utilizing the cores to the max, which means it will help you with scaling and performance. The question would be why not 1 task per core? The reason is that you can probably get a good handle on the average execution time per

Re: Is it better to Use Java or Python on Scala for Spark for using big data sets

2017-02-13 Thread Spark User
Spark has more support for scala, by that I mean more APIs are available for scala compared to python or Java. Also scala code will be more concise and easy to read. Java is very verbose. On Thu, Feb 9, 2017 at 10:21 PM, Irving Duran wrote: > I would say Java, since it

Re: Performance bug in UDAF?

2017-02-09 Thread Spark User
one has solved similar problem. Thanks, Bharath On Mon, Oct 31, 2016 at 11:40 AM, Spark User <sparkuser2...@gmail.com> wrote: > Trying again. Hoping to find some help in figuring out the performance > bottleneck we are observing. > > Thanks, > Bharath > > On Sun, Oct

Potential memory leak in yarn ApplicationMaster

2016-11-21 Thread Spark User
Hi All, It seems like the heap usage for org.apache.spark.deploy.yarn.ApplicationMaster keeps growing continuously. The driver crashes with OOM eventually. More details: I have a spark streaming app that runs on spark-2.0. The spark.driver.memory is 10G and spark.yarn.driver.memoryOverhead is

Re: Performance bug in UDAF?

2016-10-31 Thread Spark User
Trying again. Hoping to find some help in figuring out the performance bottleneck we are observing. Thanks, Bharath On Sun, Oct 30, 2016 at 11:58 AM, Spark User <sparkuser2...@gmail.com> wrote: > Hi All, > > I have a UDAF that seems to perform poorly when its input is skewed

Performance bug in UDAF?

2016-10-30 Thread Spark User
Hi All, I have a UDAF that seems to perform poorly when its input is skewed. I have been debugging the UDAF implementation but I don't see any code that is causing the performance to degrade. More details on the data and the experiments I have run. DataSet: Assume 3 columns, column1 being the

RDD to Dataset results in fixed number of partitions

2016-10-21 Thread Spark User
Hi All, I'm trying to create a Dataset from RDD and do groupBy on the Dataset. The groupBy stage runs with 200 partitions. Although the RDD had 5000 partitions. I also seem to have no way to change that 200 partitions on the Dataset to some other large number. This seems to be affecting the

Question about single/multi-pass execution in Spark-2.0 dataset/dataframe

2016-09-27 Thread Spark User
case class Record(keyAttr: String, attr1: String, attr2: String, attr3: String) val ds = sparkSession.createDataset(rdd).as[Record] val attr1Counts = ds.groupBy('keyAttr', 'attr1').count() val attr2Counts = ds.groupBy('keyAttr', 'attr2').count() val attr3Counts = ds.groupBy('keyAttr',

Data Frame support CSV or excel format ?

2015-08-27 Thread spark user
Hi all , Can we create data frame from excels sheet or csv file , in below example It seems they support only json ? DataFrame df = sqlContext.read().json(examples/src/main/resources/people.json);

Re: Spark 1.3.1 + Hive: write output to CSV with header on S3

2015-07-17 Thread spark user
Hi Roberto  I have question regarding HiveContext .  when you create HiveContext where you define Hive connection properties ?   Suppose Hive is not in local machine i need to connect , how HiveConext will know the data base info like url ,username and password ? String username = ; String

Re: Java 8 vs Scala

2015-07-15 Thread spark user
I struggle lots in Scala , almost 10 days n0 improvement , but when i switch to Java 8 , things are so smooth , and I used Data Frame with Redshift and Hive all are looking good .if you are very good In Scala the go with Scala otherwise Java is best fit  . This is just my openion because I am

Data Frame for nested json

2015-07-14 Thread spark user
is DataFrame  support nested json to dump directely to data base  For simple json it working fine  {id:2,name:Gerald,email:gbarn...@zimbio.com,city:Štoky,country:Czech Republic,ip:92.158.154.75”},  But for nested json it failed to load  root |-- rows: array (nullable = true) |    |-- element:

Java 8 vs Scala

2015-07-14 Thread spark user
Hi All  To Start new project in Spark , which technology is good .Java8 OR  Scala . I am Java developer , Can i start with Java 8  or I Need to learn Scala . which one is better technology  for quick start any POC project  Thanks  - su 

Re: spark - redshift !!!

2015-07-08 Thread spark user
,/Shahab On Wed, Jul 8, 2015 at 12:57 AM, spark user spark_u...@yahoo.com.invalid wrote: Hi Can you help me how to load data from s3 bucket to  redshift , if you gave sample code can you pls send me  Thanks su

spark - redshift !!!

2015-07-07 Thread spark user
Hi Can you help me how to load data from s3 bucket to  redshift , if you gave sample code can you pls send me  Thanks su

Re: s3 bucket access/read file

2015-06-29 Thread spark user
: S3 HEAD request failed for '/user%2Fdidi' - ResponseCode=400, ResponseMessage=Bad Request what does the user has to do here??? i am using key secret !!! How can i simply create RDD from text file on S3 Thanks Didi -- View this message in context: http://apache-spark-user-list.1001560.n3

Re: Scala/Python or Java

2015-06-25 Thread spark user
commonly used language in the production environment. Learning Scala requires some time. If you're very comfortable with Java / Python, you can go with that while at the same time familiarizing yourself with Scala. Cheers On Thu, Jun 25, 2015 at 12:04 PM, spark user spark_u...@yahoo.com.invalid

Scala/Python or Java

2015-06-25 Thread spark user
Hi All , I am new for spark , i just want to know which technology is good/best for spark learning ? 1) Scala 2) Java 3) Python  I know spark support all 3 languages , but which one is best ? Thanks su