Re: Read Parquet in Java Spark

2016-03-31 Thread UMESH CHAUDHARY
>From Spark Documentation: DataFrame parquetFile = sqlContext.read().parquet("people.parquet"); JavaRDD jRDD= parquetFile.javaRDD() javaRDD() method will convert the DF to RDD On Thu, Mar 31, 2016 at 2:51 PM, Ramkumar V wrote: > Hi, > > I'm trying to read parquet log

Re: confusing about Spark SQL json format

2016-03-31 Thread UMESH CHAUDHARY
quot;California"}} > ] > > > --- > > 2. > > > > > {"name": [&q

Re: confusing about Spark SQL json format

2016-03-31 Thread UMESH CHAUDHARY
Hi, Look at below image which is from json.org : [image: Inline image 1] The above image describes the object formulation of below JSON: Object 1=> {"name":"Yin", "address":{"city":"Columbus","state":"Ohio"}} Object=> {"name":"Michael", "address":{"city":null, "state":"California"}} Note that

Re: Twitter receiver not running in spark 1.6.0

2016-03-28 Thread UMESH CHAUDHARY
Could you post your code what you are using for streaming context ! On Mon, Mar 28, 2016 at 10:31 AM, lokeshkumar wrote: > Hi forum > > For some reason if I include a twitter receiver and start the streaming > context, I get the below exception not sure why > Can someone let

Re: Read files dynamically having different schema under one parent directory + scala + Spakr 1.5,2

2016-02-19 Thread UMESH CHAUDHARY
If I understood correctly, you can have many sub-dirs under *hdfs:///TestDirectory *and and you need to attach a schema to all part files in a sub-dir. 1) I am assuming that you know the sub-dirs names : For that, you need to list all sub-dirs inside *hdfs:///TestDirectory *using Scala,

Re: Hive REGEXP_REPLACE use or equivalent in Spark

2016-02-19 Thread UMESH CHAUDHARY
My CSV: *name,checked-in,booking_cost* AC,true,1200 BK,false,0 DDC,true,1200 I have done: val textFile=sc.textFile("/home/user/sampleCSV.txt") val schemaString="name,checked-in,booking_cost" import org.apache.spark.sql.Row; import

Re: New line lost in streaming output file

2016-02-16 Thread UMESH CHAUDHARY
Try to print RDD before writing to validate that you are getting '\n' from Kafka. On Tue, Feb 16, 2016 at 4:19 PM, Ashutosh Kumar wrote: > Hi Chandeep, > Thanks for response. Issue is the new line feed is lost. All records > appear in one line only. > > Thanks >

Re: Spark on Windows

2016-02-15 Thread UMESH CHAUDHARY
You can check "spark.master" property in conf/spark-defaults.conf and try to give IP of the VM in place of "localhost". On Tue, Feb 16, 2016 at 7:48 AM, KhajaAsmath Mohammed < mdkhajaasm...@gmail.com> wrote: > Hi, > > I am new to spark and starting working on it by writing small programs. I > am

Fwd: how to submit multiple jar files when using spark-submit script in shell?

2016-01-11 Thread UMESH CHAUDHARY
Could you build a fat jar by including all your dependencies along with you application. See here and here

Re: How to load partial data from HDFS using Spark SQL

2016-01-01 Thread UMESH CHAUDHARY
Ok, so whats wrong in using : var df=HiveContext.sql("Select * from table where id = ") //filtered data frame df.count On Sat, Jan 2, 2016 at 11:56 AM, SRK wrote: > Hi, > > How to load partial data from hdfs using Spark SQL? Suppose I want to load > data based on a

Re: persist spark output in hive using DataFrame and saveAsTable API

2015-12-07 Thread UMESH CHAUDHARY
currently saveAsTable will create Hive Internal table by default see here If you want to save it as external table, use saveAsParquetFile and create an external hive table on that parquet file. On Mon,

Conversely, Hive is performing better than Spark-Sql

2015-11-24 Thread UMESH CHAUDHARY
Hi, I am using Hive 1.1.0 and Spark 1.5.1 and creating hive context in spark-shell. Now, I am experiencing reversed performance by Spark-Sql over Hive. By default Hive gives result back in 27 seconds for plain select * query on 1 GB dataset containing 3623203 records, while spark-sql gives back

Re: ClassCastException while reading data from HDFS through Spark

2015-10-07 Thread UMESH CHAUDHARY
As per the Exception, it looks like there is a mismatch in actual sequence file's value type and the one which is provided by you in your code. Change BytesWritable to *LongWritable * and feel the execution. -Umesh On Wed, Oct 7, 2015 at 2:41 PM, Vinoth Sankar wrote: >

Re: What am I missing that's preventing javac from finding the libraries (CLASSPATH is setup...)?

2015-08-19 Thread UMESH CHAUDHARY
Just add spark_1.4.1_yarn_shuffle.jar in ClassPath or create a New Maven project using below dependency: dependency groupIdorg.apache.spark/groupId artifactIdspark-core_2.11/artifactId version1.4.1/version /dependency dependency groupIdorg.apache.spark/groupId artifactIdspark-sql_2.11/artifactId

Re: Too many files/dirs in hdfs

2015-08-18 Thread UMESH CHAUDHARY
, Mohit Anchlia mohitanch...@gmail.com wrote: Is there a way to store all the results in one file and keep the file roll over separate than the spark streaming batch interval? On Mon, Aug 17, 2015 at 2:39 AM, UMESH CHAUDHARY umesh9...@gmail.com wrote: In Spark Streaming you can simply check

Re: Too many files/dirs in hdfs

2015-08-17 Thread UMESH CHAUDHARY
In Spark Streaming you can simply check whether your RDD contains any records or not and if records are there you can save them using FIleOutputStream: DStream.foreachRDD(t= { var count = t.count(); if (count0){ // SAVE YOUR STUFF} }; This will not create unnecessary files of 0 bytes. On Mon,

Streaming on Exponential Data

2015-08-13 Thread UMESH CHAUDHARY
Hi, I was working with non-reliable receiver version of Spark-Kafka streaming i.e. KafkaUtils,createStream... where for testing purpose I was getting data at constant rate from kafka and it was acting as expected. But when there was exponential data in Kafka, my program started crashing saying

Re: Heatmap with Spark Streaming

2015-07-30 Thread UMESH CHAUDHARY
with d3 js. Thanks Best Regards On Tue, Jul 28, 2015 at 12:18 PM, UMESH CHAUDHARY umesh9...@gmail.com wrote: I have just started using Spark Streaming and done few POCs. It is fairly easy to implement. I was thinking of presenting the data using some smart graphing dashboarding tools e.g