Re: AVRO vs Parquet

2016-03-10 Thread Guru Medasani
Thanks Michael for clarifying this. My response is inline. Guru Medasani gdm...@gmail.com > On Mar 10, 2016, at 12:38 PM, Michael Armbrust <mich...@databricks.com> wrote: > > A few clarifications: > > 1) High memory and cpu usage. This is because Parquet fi

Re: AVRO vs Parquet

2016-03-09 Thread Guru Medasani
ta in the metastore and not have that be reflected in the Avro schema as well. Guru Medasani gdm...@gmail.com > On Mar 4, 2016, at 7:36 AM, Paul Leclercq <paul.lecle...@tabmo.io> wrote: > > > > Nice article about Parquet with Avro : > https://dzone.com/articles/und

Re: Building a REST Service with Spark back-end

2016-03-02 Thread Guru Medasani
Hi Yanlin, This is a fairly new effort and is not officially released/supported by Cloudera yet. I believe those numbers will be out once it is released. Guru Medasani gdm...@gmail.com > On Mar 2, 2016, at 10:40 AM, yanlin wang <yanl...@me.com> wrote: > > Did any one use Liv

Re: Building a REST Service with Spark back-end

2016-03-02 Thread Guru Medasani
Int = 78501'}, u'execution_count': 1, u'status': u'ok'}, u'state': u'available'} Guru Medasani gdm...@gmail.com > On Mar 2, 2016, at 7:47 AM, Todd Nist <tsind...@gmail.com> wrote: > > Have you looked at Apache Toree, http://toree.apache.org/ > <

Re: Error in load hbase on spark

2015-10-09 Thread Guru Medasani
a.com/blog/2015/08/apache-spark-comes-to-apache-hbase-with-hbase-spark-module/> HBase Jira link: https://issues.apache.org/jira/browse/HBASE-13992 <https://issues.apache.org/jira/browse/HBASE-13992> Guru Medasani gdm...@gmail.com > On Oct 8, 2015, at 9:29 PM, Roy Wang <roywang

Re: Change protobuf version or any other third party library version in Spark application

2015-09-15 Thread Guru Medasani
he old option was deprecated, and aliased to the new one (spark.executor.userClassPathFirst). The existing "child-first" class loader also had to be fixed. It didn't handle resources, and it was also doing some things that ended up causing JVM errors depending on how things were bein

Re: Problem while loading saved data

2015-09-02 Thread Guru Medasani
mary file found under > file:/home/ubuntu/ipython/people.parquet2. Guru Medasani gdm...@gmail.com > On Sep 2, 2015, at 8:25 PM, Amila De Silva <jaa...@gmail.com> wrote: > > Hi All, > > I have a two node spark cluster, to which I'm connecting using IPython > notebo

Re: Spark + Jupyter (IPython Notebook)

2015-08-18 Thread Guru Medasani
Guru Medasani gdm...@gmail.com On Aug 18, 2015, at 12:29 PM, Jerry Lam chiling...@gmail.com wrote: Hi Guru, Thanks! Great to hear that someone tried it in production. How do you like it so far? Best Regards, Jerry On Tue, Aug 18, 2015 at 11:38 AM, Guru Medasani gdm

Re: Spark + Jupyter (IPython Notebook)

2015-08-18 Thread Guru Medasani
-spark/ http://blog.cloudera.com/blog/2014/08/how-to-use-ipython-notebook-with-apache-spark/ Guru Medasani gdm...@gmail.com On Aug 18, 2015, at 8:35 AM, Jerry Lam chiling...@gmail.com wrote: Hi spark users and developers, Did anyone have IPython Notebook (Jupyter) deployed in production

Re: Topology.py -- Cannot run on Spark Gateway on Cloudera 5.4.4.

2015-08-03 Thread Guru Medasani
Hi Upen, Did you deploy the client configs after assigning the gateway roles? You should be able to do this from Cloudera Manager. Can you try this and let us know what you see when you run spark-shell? Guru Medasani gdm...@gmail.com On Aug 3, 2015, at 9:10 PM, Upen N ukn...@gmail.com

Re: Spark-Submit error

2015-08-03 Thread Guru Medasani
Hi Satish, Can you add more error or log info to the email? Guru Medasani gdm...@gmail.com On Jul 31, 2015, at 1:06 AM, satish chandra j jsatishchan...@gmail.com wrote: HI, I have submitted a Spark Job with options jars,class,master as local but i am getting an error as below dse

Re: Spark-Submit error

2015-08-03 Thread Guru Medasani
Thanks Satish. I only see the INFO messages and don’t see any error messages in the output you pasted. Can you paste the log with the error messages? Guru Medasani gdm...@gmail.com On Aug 3, 2015, at 11:12 PM, satish chandra j jsatishchan...@gmail.com wrote: Hi Guru, I am executing

Re: How to verify that the worker is connected to master in CDH5.4

2015-07-07 Thread Guru Medasani
in the thread? Guru Medasani gdm...@gmail.com On Jul 7, 2015, at 10:42 PM, Ashish Dutt ashish.du...@gmail.com wrote: Hi, I have CDH 5.4 installed on a linux server. It has 1 cluster in which spark is deployed as a history server. I am trying to connect my laptop to the spark history server. When

Re: How to verify that the worker is connected to master in CDH5.4

2015-07-07 Thread Guru Medasani
on the server where Spark history server is running. Guru Medasani gdm...@gmail.com On Jul 8, 2015, at 12:01 AM, Ashish Dutt ashish.du...@gmail.com wrote: Hello Guru, Thank you for your quick response. This is what i get when I try executing spark-shell master ip:port number C:\spark

Re: Is there programmatic way running Spark job on Yarn cluster without using spark-submit script ?

2015-06-17 Thread Guru Medasani
/td-p/24721 http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/What-dependencies-to-submit-Spark-jobs-programmatically-not-via/td-p/24721 Guru Medasani gdm...@gmail.com On Jun 17, 2015, at 6:01 PM, Elkhan Dadashov elkhan8...@gmail.com wrote: This is not independent

Re: SparkR 1.4.0: read.df() function fails

2015-06-16 Thread Guru Medasani
hdfs path should be able to help here. Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://smalldata13.hdp:8020/home/esten/ami/usaf.json hdfs://smalldata13.hdp:8020/home/esten/ami/usaf.json Guru Medasani gdm...@gmail.com On Jun 16, 2015, at 10:39 AM

Re: Spark 1.4 release date

2015-06-12 Thread Guru Medasani
Here is a spark 1.4 release blog by data bricks. https://databricks.com/blog/2015/06/11/announcing-apache-spark-1-4.html https://databricks.com/blog/2015/06/11/announcing-apache-spark-1-4.html Guru Medasani gdm...@gmail.com On Jun 12, 2015, at 7:08 AM, ayan guha guha.a...@gmail.com wrote

Re: Nightly builds/releases?

2015-05-04 Thread Guru Medasani
I see a Jira for this one, but unresolved. https://issues.apache.org/jira/browse/SPARK-1517 https://issues.apache.org/jira/browse/SPARK-1517 On May 4, 2015, at 10:25 PM, Ankur Chauhan achau...@brightcove.com wrote: Hi, Does anyone know if spark has any nightly builds or equivalent

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-28 Thread Guru Medasani
Hi Antony, Did you get pass this error by repartitioning your job with smaller tasks as Sven Krasser pointed out? From: Antony Mayi antonym...@yahoo.com Reply-To: Antony Mayi antonym...@yahoo.com Date: Tuesday, January 27, 2015 at 5:24 PM To: Guru Medasani gdm...@outlook.com, Sven Krasser

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Guru Medasani
Hi Anthony, What is the setting of the total amount of memory in MB that can be allocated to containers on your NodeManagers? yarn.nodemanager.resource.memory-mb Can you check this above configuration in yarn-site.xml used by the node manager process? -Guru Medasani From: Sandy Ryza

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Guru Medasani
Can you attach the logs where this is failing? From: Sven Krasser kras...@gmail.com Date: Tuesday, January 27, 2015 at 4:50 PM To: Guru Medasani gdm...@outlook.com Cc: Sandy Ryza sandy.r...@cloudera.com, Antony Mayi antonym...@yahoo.com, user@spark.apache.org user@spark.apache.org Subject

RE: Spark Installation Maven PermGen OutOfMemoryException

2014-12-23 Thread Guru Medasani
Hi Vladimir, From the link Sean posted, if you use Java 8 there is this following note. Note: For Java 8 and above this step is not required. So if you have no problems using Java 8, give it a shot. Best Regards,Guru Medasani From: so...@cloudera.com Date: Tue, 23 Dec 2014 15:04:42 +

RE: Spark Installation Maven PermGen OutOfMemoryException

2014-12-23 Thread Guru Medasani
Thanks for the clarification Sean. Best Regards,Guru Medasani From: so...@cloudera.com Date: Tue, 23 Dec 2014 15:39:59 + Subject: Re: Spark Installation Maven PermGen OutOfMemoryException To: gdm...@outlook.com CC: protsenk...@gmail.com; user@spark.apache.org The text

Re: Programatically running of the Spark Jobs.

2014-09-04 Thread Guru Medasani
I am able to run Spark jobs and Spark Streaming jobs successfully via YARN on a CDH cluster. When you mean YARN isn’t quite there yet, you mean to submit the jobs programmatically? or just in general? On Sep 4, 2014, at 1:45 AM, Matt Chu m...@kabam.com wrote:

Re: Spark-submit not running

2014-08-28 Thread Guru Medasani
Can you copy the exact spark-submit command that you are running? You should be able to run it locally without installing hadoop. Here is an example on how to run the job locally. # Run application locally on 8 cores ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master

Re: Spark-submit not running

2014-08-28 Thread Guru Medasani
think at the moment there is still a dependency on Hadoop even when not using it. See https://issues.apache.org/jira/browse/SPARK-2356 On Thu, Aug 28, 2014 at 2:14 PM, Guru Medasani gdm...@outlook.com wrote: Can you copy the exact spark-submit command that you are running? You should