Running Google Dataflow on Spark

2016-11-02 Thread Ashutosh Kumar
I am trying to run Google Dataflow code on Spark. It works fine as google dataflow on google cloud platform. But while running on Spark I am getting following error 16/11/02 11:14:32 INFO com.cloudera.dataflow.spark.SparkPipelineRunner: Evaluating ParDo(GroupByKeyHashAndSortByKeyAndWindow)

Re: Create dataframe column from list

2016-07-22 Thread Ashutosh Kumar
http://stackoverflow.com/questions/36382052/converting-list-to-column-in-spark On Fri, Jul 22, 2016 at 5:15 PM, Divya Gehlot wrote: > Hi, > Can somebody help me by creating the dataframe column from the scala list . > Would really appreciate the help . > > Thanks , >

Re: Reading multiple json files form nested folders for data frame

2016-07-21 Thread Ashutosh Kumar
rtitioning and indexing in ORC its blazing > fast (query 64 million rows x 570 columns in 19 seconds). There is perhaps > a reason why SPARK makes things slow while using ORC :) > > > Regards, > Gourav > > On Thu, Jul 21, 2016 at 12:40 PM, Ashutosh Kumar <kmr.ashutos...@gmai

Re: Reading multiple json files form nested folders for data frame

2016-07-21 Thread Ashutosh Kumar
our/folder/*.json" > All files will be loaded into a dataframe and schema will be the union of > all the different schemas of your json files (only if you have different > schemas) > It should work - let me know > > Simone Miraglia > ------ >

Re: Reading multiple json files form nested folders for data frame

2016-07-21 Thread Ashutosh Kumar
-programming-guide.html#json-datasets > > Hope it helps > > Simone Miraglia > ------ > Da: Ashutosh Kumar <kmr.ashutos...@gmail.com> > Inviato: ‎21/‎07/‎2016 08:19 > A: user @spark <user@spark.apache.org> > Oggetto: Reading multiple json

Re: Reading multiple json files form nested folders for data frame

2016-07-21 Thread Ashutosh Kumar
There is no database . I read files from google cloud storage /S3/hdfs. Thanks Ashutosh On Thu, Jul 21, 2016 at 11:50 AM, Sree Eedupuganti wrote: > Database you are using ? >

Reading multiple json files form nested folders for data frame

2016-07-21 Thread Ashutosh Kumar
I need to read bunch of json files kept in date wise folders and perform sql queries on them using data frame. Is it possible to do so? Please provide some pointers . Thanks Ashutosh

Re: Streaming K-means not printing predictions

2016-04-28 Thread Ashutosh Kumar
model.setRandomCenters takes two arguments , where as java method needs 3 ? Any clues ? Thanks Ashutosh On Wed, Apr 27, 2016 at 9:59 PM, Ashutosh Kumar <kmr.ashutos...@gmail.com> wrote: > The problem seems to be streamconxt.textFileStream(path) is not reading > the file at all. It does n

Re: Streaming K-means not printing predictions

2016-04-27 Thread Ashutosh Kumar
. Thanks Ashutosh On Wed, Apr 27, 2016 at 2:43 PM, Niki Pavlopoulou <n...@exonar.com> wrote: > One of the reasons that happened to me (assuming everything is ok on your > streaming process), is if you run it on local mode instead of local[*] use > local[4]. > > On 26 April 20

removing header from csv file

2016-04-26 Thread Ashutosh Kumar
I see there is a library spark-csv which can be used for removing header and processing of csv files. But it seems it works with sqlcontext only. Is there a way to remove header from csv files without sqlcontext ? Thanks Ashutosh

Streaming K-means not printing predictions

2016-04-26 Thread Ashutosh Kumar
I created a Streaming k means based on scala example. It keeps running without any error but never prints predictions Here is Log 19:15:05,050 INFO org.apache.spark.streaming.scheduler.InputInfoTracker - remove old batch metadata: 146167824 ms 19:15:10,001 INFO

Re: Spark Streaming - graceful shutdown when stream has no more data

2016-02-23 Thread Ashutosh Kumar
Just out of curiosity I will like to know why a streaming program should shutdown when no new data is arriving? I think it should keep waiting for arrival of new records. Thanks Ashutosh On Tue, Feb 23, 2016 at 9:17 PM, Hemant Bhanawat wrote: > A guess - parseRecord is

Re: New line lost in streaming output file

2016-02-17 Thread Ashutosh Kumar
b 16, 2016 at 4:19 PM, Ashutosh Kumar <kmr.ashutos...@gmail.com> > wrote: > >> Hi Chandeep, >> Thanks for response. Issue is the new line feed is lost. All records >> appear in one line only. >> >> Thanks >> Ashutosh >> >> On Tue, Feb 16,

Re: New line lost in streaming output file

2016-02-16 Thread Ashutosh Kumar
On Feb 16, 2016, at 9:33 AM, Ashutosh Kumar <kmr.ashutos...@gmail.com> > wrote: > > I am getting multiple empty files for streaming output for each interval. > To Avoid this I tried > > kStream.foreachRDD(new VoidFunction2<JavaRDD,Time>(){ > > > > >

New line lost in streaming output file

2016-02-16 Thread Ashutosh Kumar
I am getting multiple empty files for streaming output for each interval. To Avoid this I tried kStream.foreachRDD(new VoidFunction2(){ *public void call(JavaRDD rdd,Time time) throws Exception { if(!rdd.isEmpty()){

Re: New line lost in streaming output file

2016-02-15 Thread Ashutosh Kumar
Request to provide some pointer on this. Thanks On Mon, Feb 15, 2016 at 3:39 PM, Ashutosh Kumar <kmr.ashutos...@gmail.com> wrote: > I am getting multiple empty files for streaming output for each interval. > To Avoid this I tried > > kStream.foreachRDD(new VoidFunc

New line lost in streaming output file

2016-02-15 Thread Ashutosh Kumar
I am getting multiple empty files for streaming output for each interval. To Avoid this I tried kStream.foreachRDD(new VoidFunction2(){ *public void call(JavaRDD rdd,Time time) throws Exception { if(!rdd.isEmpty()){

Tool for Visualization /Plotting of K means cluster

2016-01-22 Thread Ashutosh Kumar
I am looking for any easy to use visualization tool for KMeansModel produced as a result of clustering . Thanks Ashutosh