Respected Sir/Madam,
I am Tarunraghav. I have a query regarding spark on kubernetes.
We have an eks cluster, within which we have spark installed in the pods.
We set the executor memory as 1GB and set the executor instances as 2, I
have also set dynamic allocation as true. So when I try to read a
Sorry I forgot to ask how can I use spark context here ? I have hdfs
directory path of the files, as well as the name node of hdfs cluster.
Thanks for your help.
On Mon, Nov 21, 2016 at 9:45 PM, Raghav <raghavas...@gmail.com> wrote:
> Hi
>
> I am extremely new to Spark. I have
in HDFS is as follows:
UUID. FirstName LastName Zip
7462 John Doll06903
5231 Brad Finley 32820
Can someone point me how to get a JavaRDD object by reading the
file in HDFS ?
Thanks.
--
Raghav
to both Spark and Kafka, and looking
for some pointers to start exploring.
Thanks.
--
Raghav
Thanks a ton, guys.
On Sun, Nov 6, 2016 at 4:57 PM, raghav <raghavas...@gmail.com> wrote:
> I am newbie in the world of big data analytics, and I want to teach myself
> Apache Spark, and want to be able to write scripts to tinker with data.
>
> I have some understanding of M
spark summit
> 2014,2015 and 2016 videos. Regading practice, I would strongly suggest
> Databricks cloud (or download prebuilt from spark site). You can also take
> courses from EDX/Berkley, which are very good starter courses.
>
> On Mon, Nov 7, 2016 at 11:57 AM, raghav <raghavas.
for
some guidance for starter material, or videos.
Thanks.
Raghav
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Best-way-to-bootstrap-with-Spark-tp28032.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
-xxx.compute-1.amazonaws.com/10.165.103.16:7077 from. I never
specify that in the master url command line parameter. Any ideas on what I
might be doing wrong?
On Jun 19, 2015, at 7:19 PM, Andrew Or and...@databricks.com wrote:
Hi Raghav,
I'm assuming you're using standalone mode. When using
Thanks Andrew! Is this all I have to do when using the spark ec2 script to
setup a spark cluster? It seems to be getting an assembly jar that is not
from my project(perhaps from a maven repo). Is there a way to make the ec2
script use the assembly jar that I created?
Thanks,
Raghav
On Friday
setup scripts, it
sets up spark, but I think my custom built spark-core jar is not being used.
How do it up on EC2 so that my custom version of Spark-core is used?
Thanks,
Raghav
On Jun 9, 2015, at 7:41 PM, DB Tsai dbt...@dbtsai.com wrote:
Having the following code in RDD.scala works for me
So, I would add the assembly jar to the just the master or would I have to add
it to all the slaves/workers too?
Thanks,
Raghav
On Jun 17, 2015, at 5:13 PM, DB Tsai dbt...@dbtsai.com wrote:
You need to build the spark assembly with your modification and deploy
into cluster.
Sincerely
script will upload this jar to YARN cluster automatically
and then you can run your application as usual.
It does not care about which version of Spark in your YARN cluster.
2015-06-17 10:42 GMT+08:00 Raghav Shankar raghav0110...@gmail.com
javascript:_e(%7B%7D,'cvml','raghav0110...@gmail.com
would be very useful.
Thanks,
Raghav
On Jun 16, 2015, at 6:57 PM, Will Briggs wrbri...@gmail.com wrote:
In general, you should avoid making direct changes to the Spark source code.
If you are using Scala, you can seamlessly blend your own methods on top of
the base RDDs using implicit
that be enough to tell Spark to use that spark-core jar instead of the default?
Thanks,
Raghav
On Jun 16, 2015, at 7:19 PM, Will Briggs wrbri...@gmail.com wrote:
If this is research-only, and you don't want to have to worry about updating
the jars installed by default on the cluster, you can add your
the entire data and
collecting it on the driver node is not a typical use case? If I want to do
this using sortBy(), I would first call sortBy() followed by a collect().
Collect() would involve gathering all the data on a single machine as well.
Thanks,
Raghav
On Tuesday, June 9, 2015, Mark Hamstra m
provide some insight into this?
Thanks,
Raghav
On Thursday, June 4, 2015, Reza Zadeh r...@databricks.com wrote:
In a regular reduce, all partitions have to send their reduced value to a
single machine, and that machine can become a bottleneck.
In a treeReduce, the partitions talk to each other
---
Blog: https://www.dbtsai.com
On Thu, Jun 4, 2015 at 10:46 AM, Raghav Shankar raghav0110...@gmail.com
javascript:; wrote:
Hey Reza,
Thanks for your response!
Your response clarifies some of my initial thoughts. However, what I
don't
send the serialized version of the RDD and function to my other
program. My thought is that I might need to add more jars to the build path,
but I have no clue if thats the issue and what jars I need to add.
Thanks,
Raghav
On Apr 13, 2015, at 10:22 PM, Imran Rashid iras...@cloudera.com wrote
)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
On Apr 17, 2015, at 2:30 AM, Raghav Shankar raghav0110...@gmail.com wrote:
Hey Imran,
Thanks for the great explanation! This cleared up a lot of things for me. I
am actually trying to utilize some of the features within Spark
object to my
second program?
Thanks,
Raghav
On Mon, Apr 6, 2015 at 3:08 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Are you expecting to receive 1 to 100 values in your second program?
RDD is just an abstraction, you would need to do like:
num.foreach(x = send(x))
Thanks
Best Regards
20 matches
Mail list logo