Re: Unsubscribe

2018-01-18 Thread Yash Sharma
Please send mail to user-unsubscr...@spark.apache.org to unsubscribe. Cheers On Fri., 19 Jan. 2018, 5:11 pm Anu B Nair, wrote: >

Re: Unsubscribe

2018-01-18 Thread Yash Sharma
Please send mail to user-unsubscr...@spark.apache.org to unsubscribe. Cheers On Fri., 19 Jan. 2018, 5:28 pm Sbf xyz, wrote: >

Re: Quick one... AWS SDK version?

2017-10-03 Thread Yash Sharma
Hi JG, Here are my cluster configs if it helps. Cheers. EMR: emr-5.8.0 Hadoop distribution: Amazon 2.7.3 AWS sdk: /usr/share/aws/aws-java-sdk/aws-java-sdk-1.11.160.jar Applications: Hive 2.3.0 Spark 2.2.0 Tez 0.8.4 On Tue, 3 Oct 2017 at 12:29 JG Perrin wrote: > Hey

Re: Error while reading the CSV

2017-04-07 Thread Yash Sharma
yan sharma <nayansharm...@gmail.com> wrote: > Hi Yash, > I know this will work perfect but here I wanted to read the csv using the > assembly jar file. > > Thanks, > Nayan > > On 07-Apr-2017, at 10:02 AM, Yash Sharma <yash...@gmail.com> wrote: > > Hi Nayan, >

Re: distinct query getting stuck at ShuffleBlockFetcherIterator

2017-04-06 Thread Yash Sharma
Hi Ramesh, Could you share some logs please? pastebin ? dag view ? Did you check for GC pauses if any. On Thu, 6 Apr 2017 at 21:55 Ramesh Krishnan wrote: > I have a use case of distinct on a dataframe. When i run the application > is getting stuck at LINE

Re: Error while reading the CSV

2017-04-06 Thread Yash Sharma
Hi Nayan, I use the --packages with the spark shell and the spark submit. Could you please try that and let us know: Command: spark-submit --packages com.databricks:spark-csv_2.11:1.4.0 On Fri, 7 Apr 2017 at 00:39 nayan sharma wrote: > spark version 1.6.2 > scala

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-06 Thread Yash Sharma
Hi Shyla, We could suggest based on what you're trying to do exactly. But with the given information - If you have your spark job ready you could schedule it via any scheduling framework like Airflow or Celery or Cron based on how simple/complex you want your work flow to be. Cheers, Yash On

Re: Spark job fails as soon as it starts. Driver requested a total number of 168510 executor

2016-09-24 Thread Yash Sharma
too many small files you are trying to read? Number of > executors are very high > On 24 Sep 2016 10:28, "Yash Sharma" <yash...@gmail.com> wrote: > >> Have been playing around with configs to crack this. Adding them here >> where it would be helpful to others :)

Re: Spark job fails as soon as it starts. Driver requested a total number of 168510 executor

2016-09-23 Thread Yash Sharma
n them reasonable > memory. This can be around 48 assuming 12 nodes x 4 cores each. You could > start with processing a subset of your data and see if you are able to get > a decent performance. Then gradually increase the maximum # of execs for > dynamic allocation and process the remaining

Re: Spark job fails as soon as it starts. Driver requested a total number of 168510 executor

2016-09-23 Thread Yash Sharma
:27 AM, Yash Sharma <yash...@gmail.com> wrote: > Have been playing around with configs to crack this. Adding them here > where it would be helpful to others :) > Number of executors and timeout seemed like the core issue. > > {code} > --driver-memory 4G \ > --conf spar

Re: Spark job fails as soon as it starts. Driver requested a total number of 168510 executor

2016-09-23 Thread Yash Sharma
with fix number of executors and try. May > be 12 executors for testing and let know the status. > > Get Outlook for Android <https://aka.ms/ghei36> > > > > On Fri, Sep 23, 2016 at 3:13 PM +0530, "Yash Sharma" <yash...@gmail.com> > wrote: > > Than

Re: Spark SQL overwrite/append for partitioned tables

2016-07-25 Thread Yash Sharma
Correction - dataDF.write.partitionBy(“year”, “month”, “date”).mode(SaveMode.Append).text(“s3://data/test2/events/”) On Tue, Jul 26, 2016 at 10:59 AM, Yash Sharma <yash...@gmail.com> wrote: > Based on the behavior of spark [1], Overwrite mode will delete all your > data when you try

Re: Spark SQL overwrite/append for partitioned tables

2016-07-25 Thread Yash Sharma
Based on the behavior of spark [1], Overwrite mode will delete all your data when you try to overwrite a particular partition. What I did- - Use S3 api to delete all partitions - Use spark df to write in Append mode [2] 1.

Re: Streaming from Kinesis is not getting data in Yarn cluster

2016-07-15 Thread Yash Sharma
I struggled with kinesis for a long time and got all my findings documented at - http://stackoverflow.com/questions/35567440/spark-not-able-to-fetch-events-from-amazon-kinesis Let me know if it helps. Cheers, Yash - Thanks, via mobile, excuse brevity. On Jul 16, 2016 6:05 AM, "dharmendra"

Re: Error in Spark job

2016-07-12 Thread Yash Sharma
Looks like the write to Aerospike is taking too long. Could you try writing the rdd directly to filesystem, skipping the Aerospike write. foreachPartition at WriteToAerospike.java:47, took 338.345827 s - Thanks, via mobile, excuse brevity. On Jul 12, 2016 8:08 PM, "Saurav Sinha"

Re: Spark SQL: Merge Arrays/Sets

2016-07-11 Thread Yash Sharma
This answers exactly what you are looking for - http://stackoverflow.com/a/34204640/1562474 On Tue, Jul 12, 2016 at 6:40 AM, Pedro Rodriguez wrote: > Is it possible with Spark SQL to merge columns whose types are Arrays or > Sets? > > My use case would be something

Re: Fast database with writes per second and horizontal scaling

2016-07-11 Thread Yash Sharma
Spark is more of an execution engine rather than a database. Hive is a data warehouse but I still like treating it as an execution engine. For databases, You could compare HBase and Cassandra as they both have very wide usage and proven performance. We have used Cassandra in the past and were

Re: Spark cluster tuning recommendation

2016-07-11 Thread Yash Sharma
I would say use the dynamic allocation rather than number of executors. Provide some executor memory which you would like. Deciding the values requires couple of test runs and checking what works best for you. You could try something like - --driver-memory 1G \ --executor-memory 2G \

Re: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

2016-06-22 Thread Yash Sharma
with a non-zero exit code 1 > Failing this attempt. Failing the application. > > but command get error > > shihj@master:~/workspace/hadoop-2.6.4$ yarn logs -applicationId > application_1466568126079_0006 > Usage: yarn [options] > > yarn: error: no such option: -a > &g

Re: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

2016-06-22 Thread Yash Sharma
ially jar package),because > them very big,the application will wait for too long,there are good method?? > so i config that para, but not get the my want to effect。 > > > -- 原始邮件 ------ > *发件人:* "Yash Sharma";<yash...@gmail.com>; > *发送时间:*

Re: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

2016-06-22 Thread Yash Sharma
$runMain(SparkSubmit.scala:731) > at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.mai

Re: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

2016-06-22 Thread Yash Sharma
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > > > -- 原始邮件 -- > *发件人:* "Yash Sharma";<yash...@gmail.com>; > *发送时间:* 2016年6月22日(星期三)

Re: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

2016-06-22 Thread Yash Sharma
.jar On Wed, Jun 22, 2016 at 4:27 PM, 另一片天 <958943...@qq.com> wrote: > Is it able to run on local mode ? > > what mean?? standalone mode ? > > > -- 原始邮件 ------ > *发件人:* "Yash Sharma";<yash...@gmail.com>; > *发送时间:* 2016年6月22

Re: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

2016-06-22 Thread Yash Sharma
aster:9000/user/shihj/spark_lib/spark-examples-1.6.1-hadoop2.6.0.jar > shihj@master:~/workspace/hadoop-2.6.4$ > can find the jar on all nodes. > > > -- 原始邮件 -- > *发件人:* "Yash Sharma";<yash...@gmail.com>; > *发送时间:* 2016年6月22日(星期三) 下午

Re: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

2016-06-22 Thread Yash Sharma
SparkSubmit.scala:206) >> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >> get error at once >> -- 原始邮件 -- >> *发件人:* "Yash Sharma";<yash...@gmail.com&

Re: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

2016-06-22 Thread Yash Sharma
How about supplying the jar directly in spark submit - ./bin/spark-submit \ > --class org.apache.spark.examples.SparkPi \ > --master yarn-client \ > --driver-memory 512m \ > --num-executors 2 \ > --executor-memory 512m \ > --executor-cores 2 \ >

Re: Python to Scala

2016-06-18 Thread Yash Sharma
Spark on Scala, I code in Scala for Spark, >>> though am new, but I know it and still learning. But I need help in >>> converting this code to Scala. I've nearly no knowledge in Python, hence, >>> requested the experts here. >>> >>> Hope you get me

Re: Python to Scala

2016-06-17 Thread Yash Sharma
You could use pyspark to run the python code on spark directly. That will cut the effort of learning scala. https://spark.apache.org/docs/0.9.0/python-programming-guide.html - Thanks, via mobile, excuse brevity. On Jun 18, 2016 2:34 PM, "Aakash Basu" wrote: > Hi all, > >

Re: StackOverflow in Spark

2016-06-01 Thread Yash Sharma
Not sure if its related, But I got a similar stack overflow error some time back while reading files and converting them to parquet. > Stack trace- > 16/06/02 02:23:54 INFO YarnAllocator: Driver requested a total number of > 32769 executor(s). > 16/06/02 02:23:54 INFO ExecutorAllocationManager:

Re: [Streaming-Kafka] How to start from topic offset when streamcontext is using checkpoint

2016-01-25 Thread Yash Sharma
the > spark checkpoints. Streaming context is getting prepared from the > checkpoint directory and started consuming from the topic offsets which > were stored in checkpoint directory. > > > On Sat, Jan 23, 2016 at 3:44 PM, Yash Sharma <yash...@gmail.com> wrote: > &

Re: [Streaming-Kafka] How to start from topic offset when streamcontext is using checkpoint

2016-01-23 Thread Yash Sharma
Hi Raju, Could you please explain your expected behavior with the DStream. The DStream will have event only from the 'fromOffsets' that you provided in the createDirectStream (which I think is the expected behavior). For the smaller files, you will have to deal with smaller files if you intend to

Re: Writing partitioned Avro data to HDFS

2015-12-22 Thread Yash Sharma
Hi Jan, Is the error because a past run of the job has already written to the location? In that case you can add more granularity with 'time' along with year and month. That should give you a distinct path for every run. Let us know if it helps or if i missed anything. Goodluck - Thanks, via

Re: Apache spark certification pass percentage ?

2015-12-22 Thread Yash Sharma
Hi Sri, That would depend on the organization from where you are applying the certification. This place would be more helpful where you can ask about questions and information about using spark and/or contributing to spark. Goodluck - Thanks, via mobile, excuse brevity. On Dec 22, 2015 3:56

Re: Client session timed out, have not heard from server in

2015-12-22 Thread Yash Sharma
Hi Evan, SPARK-9629 referred to connection issues with zookeeper. Could you check if its working fine in your setup. Also please share other error logs you might be getting. - Thanks, via mobile, excuse brevity. On Dec 22, 2015 5:00 PM, "yaoxiaohua" wrote: > Hi, > >

Re: Client session timed out, have not heard from server in

2015-12-22 Thread Yash Sharma
ike this: >> >> >> INFO ClientCnxn: Client session timed out, have not heard from server in >> 40015ms for sessionid 0x351c416297a145a, closing socket connection and >> attempting reconnect >> >> Before spark2 master process shut down. >> >> I

Re: Writing partitioned Avro data to HDFS

2015-12-22 Thread Yash Sharma
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:137) > at > com.databricks.spark.avro.package$AvroDataFrameWriter$$anonfun$avro$1.apply(package.scala:37) > at > com.databricks.spark.avro.package$AvroDataFrameWriter$$anonfun$avro$1.apply(package.scala:37) >

Re: Writing partitioned Avro data to HDFS

2015-12-22 Thread Yash Sharma
sRelation.scala:76) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57) > at > org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:6