Re: Issue in spark job. Remote rpc client dissociated

2016-07-14 Thread Balachandar R.A.
gt; > On Jul 14, 2016, at 12:28, Balachandar R.A. <balachandar...@gmail.com> > wrote: > > Hello Ted, >> > > > Thanks for the response. Here is the additional information. > > >> I am using spark 1.6.1 (spark-1.6.1-bin-hadoop2.6) >> >> Here is

Re: Issue in spark job. Remote rpc client dissociated

2016-07-13 Thread Balachandar R.A.
> > Hello Ted, > Thanks for the response. Here is the additional information. > I am using spark 1.6.1 (spark-1.6.1-bin-hadoop2.6) > > > > Here is the code snippet > > > > > > JavaRDD add = jsc.parallelize(listFolders, listFolders.size()); > > JavaRDD test = add.map(new

Issue in spark job. Remote rpc client dissociated

2016-07-13 Thread Balachandar R.A.
Hello In one of my use cases, i need to process list of folders in parallel. I used Sc.parallelize (list,list.size).map(" logic to process the folder"). I have a six node cluster and there are six folders to process. Ideally i expect that each of my node process one folder. But, i see that a

Spark job state is EXITED but does not return

2016-07-11 Thread Balachandar R.A.
Hello, I have one apache spark based simple use case that process two datasets. Each dataset takes about 5-7 min to process. I am doing this processing inside the sc.parallelize(datasets){ } block. While the first dataset is processed successfully, the processing of dataset is not started by

Re: One map per folder in spark or Hadoop

2016-07-07 Thread Balachandar R.A.
{ iter => > val folder = iter.next > val status: Int = > Seq(status).toIterator > } > > On Jun 30, 2016, at 16:42, Balachandar R.A. <balachandar...@gmail.com> > wrote: > > Hello, > > I have some 100 folders. Each folder contains 5 files. I have an

Re: One map per folder in spark or Hadoop

2016-06-30 Thread Balachandar R.A.
titions { iter => > val folder = iter.next > val status: Int = > Seq(status).toIterator > } > > On Jun 30, 2016, at 16:42, Balachandar R.A. <balachandar...@gmail.com> > wrote: > > Hello, > > I have some 100 folders. Each folder contains 5 files. I have

One map per folder in spark or Hadoop

2016-06-30 Thread Balachandar R.A.
Hello, I have some 100 folders. Each folder contains 5 files. I have an executable that process one folder. The executable is a black box and hence it cannot be modified.I would like to process 100 folders in parallel using Apache spark so that I should be able to span a map task per folder. Can

How to calculate weighted degrees in GraphX

2016-02-01 Thread Balachandar R.A.
I am new to GraphX and exploring example flight data analysis found on online. http://www.sparktutorials.net/analyzing-flight-data:-a-gentle-introduction-to-graphx-in-spark I tried calculating inDegrees (understand how many incoming flights to an airport) but I see value which corresponds to

Re: GraphX can show graph?

2016-01-29 Thread Balachandar R.A.
Thanks... Will look into that - Bala On 28 January 2016 at 15:36, Sahil Sareen <sareen...@gmail.com> wrote: > Try Neo4j for visualization, GraphX does a pretty god job at distributed > graph processing. > > On Thu, Jan 28, 2016 at 12:42 PM, Balachandar R.A. < > balacha

converting categorical values in csv file to numerical values

2015-11-05 Thread Balachandar R.A.
HI I am new to spark MLlib and machine learning. I have a csv file that consists of around 100 thousand rows and 20 columns. Of these 20 columns, 10 contains string values. Each value in these columns are not necessarily unique. They are kind of categorical, that is, the values could be one

Re: converting categorical values in csv file to numerical values

2015-11-05 Thread Balachandar R.A.
> Can't you do a simple dictionnary and map those values to numbers? > > Cheers > Guillaume > > On 5 November 2015 at 09:54, Balachandar R.A. <balachandar...@gmail.com> > wrote: > >> HI >> >> >> I am new to spark MLlib and machine learning. I h

Re: Error : - No filesystem for scheme: spark

2015-11-02 Thread Balachandar R.A.
-- Forwarded message -- From: "Balachandar R.A." <balachandar...@gmail.com> Date: 02-Nov-2015 12:53 pm Subject: Re: Error : - No filesystem for scheme: spark To: "Jean-Baptiste Onofré" <j...@nanthrax.net> Cc: > HI JB, > Thanks for the respo

Re: Error : - No filesystem for scheme: spark

2015-11-02 Thread Balachandar R.A.
>> On 2 November 2015 at 14:59, Romi Kuntsman <r...@totango.com >> <mailto:r...@totango.com>> wrote: >> >> except "spark.master", do you have "spark://" anywhere in your code >> or config files? >> >>

Re: Error : - No filesystem for scheme: spark

2015-11-02 Thread Balachandar R.A.
mi Kuntsman*, *Big Data Engineer* > http://www.totango.com > > On Mon, Nov 2, 2015 at 11:27 AM, Balachandar R.A. < > balachandar...@gmail.com> wrote: > >> >> -- Forwarded message -- >> From: "Balachandar R.A." <balachandar...@g

Re: Error : - No filesystem for scheme: spark

2015-11-02 Thread Balachandar R.A.
I made a stupid mistake it seems. I supplied the --master option to the spark url in my launch command. And this error is gone. Thanks for pointing out possible places for troubleshooting Regards Bala On 02-Nov-2015 3:15 pm, "Balachandar R.A." <balachandar...@gmail.com> wro

Error : - No filesystem for scheme: spark

2015-11-01 Thread Balachandar R.A.
Can someone tell me at what point this error could come? In one of my use cases, I am trying to use hadoop custom input format. Here is my code. val hConf: Configuration = sc.hadoopConfiguration hConf.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)

Using Hadoop Custom Input format in Spark

2015-10-27 Thread Balachandar R.A.
Hello, I have developed a hadoop based solution that process a binary file. This uses classic hadoop MR technique. The binary file is about 10GB and divided into 73 HDFS blocks, and the business logic written as map process operates on each of these 73 blocks. We have developed a