Spark on Kubernetes

2024-04-29 Thread Tarun raghav
Respected Sir/Madam, I am Tarunraghav. I have a query regarding spark on kubernetes. We have an eks cluster, within which we have spark installed in the pods. We set the executor memory as 1GB and set the executor instances as 2, I have also set dynamic allocation as true. So when I try to read a

Re: newbie question about RDD

2016-11-21 Thread Raghav
Sorry I forgot to ask how can I use spark context here ? I have hdfs directory path of the files, as well as the name node of hdfs cluster. Thanks for your help. On Mon, Nov 21, 2016 at 9:45 PM, Raghav <raghavas...@gmail.com> wrote: > Hi > > I am extremely new to Spark. I have

newbie question about RDD

2016-11-21 Thread Raghav
in HDFS is as follows: UUID. FirstName LastName Zip 7462 John Doll06903 5231 Brad Finley 32820 Can someone point me how to get a JavaRDD object by reading the file in HDFS ? Thanks. -- Raghav

Kafka Producer within a docker Instance

2016-11-11 Thread Raghav
to both Spark and Kafka, and looking for some pointers to start exploring. Thanks. -- Raghav

Re: Newbie question - Best way to bootstrap with Spark

2016-11-07 Thread Raghav
Thanks a ton, guys. On Sun, Nov 6, 2016 at 4:57 PM, raghav <raghavas...@gmail.com> wrote: > I am newbie in the world of big data analytics, and I want to teach myself > Apache Spark, and want to be able to write scripts to tinker with data. > > I have some understanding of M

Re: Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread Raghav
spark summit > 2014,2015 and 2016 videos. Regading practice, I would strongly suggest > Databricks cloud (or download prebuilt from spark site). You can also take > courses from EDX/Berkley, which are very good starter courses. > > On Mon, Nov 7, 2016 at 11:57 AM, raghav <raghavas.

Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread raghav
for some guidance for starter material, or videos. Thanks. Raghav -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Best-way-to-bootstrap-with-Spark-tp28032.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Submitting Spark Applications using Spark Submit

2015-06-20 Thread Raghav Shankar
-xxx.compute-1.amazonaws.com/10.165.103.16:7077 from. I never specify that in the master url command line parameter. Any ideas on what I might be doing wrong? On Jun 19, 2015, at 7:19 PM, Andrew Or and...@databricks.com wrote: Hi Raghav, I'm assuming you're using standalone mode. When using

Re: Submitting Spark Applications using Spark Submit

2015-06-19 Thread Raghav Shankar
Thanks Andrew! Is this all I have to do when using the spark ec2 script to setup a spark cluster? It seems to be getting an assembly jar that is not from my project(perhaps from a maven repo). Is there a way to make the ec2 script use the assembly jar that I created? Thanks, Raghav On Friday

Re: Implementing top() using treeReduce()

2015-06-17 Thread Raghav Shankar
setup scripts, it sets up spark, but I think my custom built spark-core jar is not being used. How do it up on EC2 so that my custom version of Spark-core is used? Thanks, Raghav On Jun 9, 2015, at 7:41 PM, DB Tsai dbt...@dbtsai.com wrote: Having the following code in RDD.scala works for me

Re: Implementing top() using treeReduce()

2015-06-17 Thread Raghav Shankar
So, I would add the assembly jar to the just the master or would I have to add it to all the slaves/workers too? Thanks, Raghav On Jun 17, 2015, at 5:13 PM, DB Tsai dbt...@dbtsai.com wrote: You need to build the spark assembly with your modification and deploy into cluster. Sincerely

Re: Submitting Spark Applications using Spark Submit

2015-06-16 Thread Raghav Shankar
script will upload this jar to YARN cluster automatically and then you can run your application as usual. It does not care about which version of Spark in your YARN cluster. 2015-06-17 10:42 GMT+08:00 Raghav Shankar raghav0110...@gmail.com javascript:_e(%7B%7D,'cvml','raghav0110...@gmail.com

Re: Submitting Spark Applications using Spark Submit

2015-06-16 Thread Raghav Shankar
would be very useful. Thanks, Raghav On Jun 16, 2015, at 6:57 PM, Will Briggs wrbri...@gmail.com wrote: In general, you should avoid making direct changes to the Spark source code. If you are using Scala, you can seamlessly blend your own methods on top of the base RDDs using implicit

Re: Submitting Spark Applications using Spark Submit

2015-06-16 Thread Raghav Shankar
that be enough to tell Spark to use that spark-core jar instead of the default? Thanks, Raghav On Jun 16, 2015, at 7:19 PM, Will Briggs wrbri...@gmail.com wrote: If this is research-only, and you don't want to have to worry about updating the jars installed by default on the cluster, you can add your

Re: Different Sorting RDD methods in Apache Spark

2015-06-09 Thread Raghav Shankar
the entire data and collecting it on the driver node is not a typical use case? If I want to do this using sortBy(), I would first call sortBy() followed by a collect(). Collect() would involve gathering all the data on a single machine as well. Thanks, Raghav On Tuesday, June 9, 2015, Mark Hamstra m

Re: TreeReduce Functionality in Spark

2015-06-04 Thread Raghav Shankar
provide some insight into this? Thanks, Raghav On Thursday, June 4, 2015, Reza Zadeh r...@databricks.com wrote: In a regular reduce, all partitions have to send their reduced value to a single machine, and that machine can become a bottleneck. In a treeReduce, the partitions talk to each other

Re: TreeReduce Functionality in Spark

2015-06-04 Thread Raghav Shankar
--- Blog: https://www.dbtsai.com On Thu, Jun 4, 2015 at 10:46 AM, Raghav Shankar raghav0110...@gmail.com javascript:; wrote: Hey Reza, Thanks for your response! Your response clarifies some of my initial thoughts. However, what I don't

Re: Task result in Spark Worker Node

2015-04-17 Thread Raghav Shankar
send the serialized version of the RDD and function to my other program. My thought is that I might need to add more jars to the build path, but I have no clue if thats the issue and what jars I need to add. Thanks, Raghav On Apr 13, 2015, at 10:22 PM, Imran Rashid iras...@cloudera.com wrote

Re: Task result in Spark Worker Node

2015-04-17 Thread Raghav Shankar
) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) On Apr 17, 2015, at 2:30 AM, Raghav Shankar raghav0110...@gmail.com wrote: Hey Imran, Thanks for the great explanation! This cleared up a lot of things for me. I am actually trying to utilize some of the features within Spark

Re: Sending RDD object over the network

2015-04-06 Thread Raghav Shankar
object to my second program? Thanks, Raghav On Mon, Apr 6, 2015 at 3:08 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Are you expecting to receive 1 to 100 values in your second program? RDD is just an abstraction, you would need to do like: num.foreach(x = send(x)) Thanks Best Regards