Unsubscribe

2016-08-15 Thread Sarath Chandra

Issue with wholeTextFiles

2016-03-21 Thread Sarath Chandra
I'm using Hadoop 1.0.4 and Spark 1.2.0. I'm facing a strange issue. I have a requirement to read a small file from HDFS and all it's content has to be read at one shot. So I'm using spark context's wholeTextFiles API passing the HDFS URL for the file. When I try this from a spark shell it's

Assign unique link ID

2015-10-31 Thread Sarath Chandra
Hi All, I have a hive table where data from 2 different sources (S1 and S2) get accumulated. Sample data below - *RECORD_ID|SOURCE_TYPE|TRN_NO|DATE1|DATE2|BRANCH|REF1|REF2|REF3|REF4|REF5|REF6|DC_FLAG|AMOUNT|CURRENCY*

Re: Assign unique link ID

2015-10-31 Thread Sarath Chandra
out of all columns you have > used for joining. So, records 1 and 4 should generate same hash value. > 3. group by using this new id (you have already linked the records) and > pull out required fields. > > Please let the group know if it works... > > Best > Ayan > >

PermGen Space Error

2015-07-29 Thread Sarath Chandra
Dear All, I'm using - = Spark 1.2.0 = Hive 0.13.1 = Mesos 0.18.1 = Spring = JDK 1.7 I've written a scala program which = instantiates a spark and hive context = parses an XML file which provides the where clauses for queries = generates full fledged hive queries to be run on hive

Re: PermGen Space Error

2015-07-29 Thread Sarath Chandra
laptop having 4 CPUs and 12GB RAM. On Wed, Jul 29, 2015 at 2:49 PM, fightf...@163.com fightf...@163.com wrote: Hi, Sarath Did you try to use and increase spark.excecutor.extraJaveOptions -XX:PermSize= -XX:MaxPermSize= -- fightf...@163.com *From:* Sarath Chandra

Re: PermGen Space Error

2015-07-29 Thread Sarath Chandra
with this option to rule out a config problem. On Wed, Jul 29, 2015 at 10:45 AM, Sarath Chandra sarathchandra.jos...@algofusiontech.com wrote: Yes. As mentioned in my mail at the end, I tried with both 256 and 512 options. But the issue persists. I'm giving following parameters to spark

Re: Unable to submit spark job to mesos cluster

2015-03-04 Thread Sarath Chandra
, *Sarath Chandra Josyam* Sr. Technical Architect *Algofusion Technologies India Pvt. Ltd.* Email: sarathchandra.jos...@algofusiontech.com Phone: +91-80-65330112/113 Mobile: +91 8762491331 On Wed, Mar 4, 2015 at 5:08 PM, Sarath Chandra sarathchandra.jos...@algofusiontech.com wrote: Hi, I have

Unable to submit spark job to mesos cluster

2015-03-04 Thread Sarath Chandra
Hi, I have a cluster running on CDH5.2.1 and I have a Mesos cluster (version 0.18.1). Through a Oozie java action I'm want to submit a Spark job to mesos cluster. Before configuring it as Oozie job I'm testing the java action from command line and getting exception as below. While running I'm

Parallel spark jobs on mesos cluster

2014-09-30 Thread Sarath Chandra
Hi All, I have a requirement to process a set of files in parallel. So I'm submitting spark jobs using java's ExecutorService. But when I do this way, 1 or more jobs are failing with status as EXITED. Earlier I tried with a standalone spark cluster setting the job scheduling to Fair Scheduling. I

Parallel spark jobs on standalone cluster

2014-09-25 Thread Sarath Chandra
Hi All, I have a java program which submits a spark job to a standalone spark cluster (2 nodes; 10 cores (6+4); 12GB (8+4)). This is being called by another java program through ExecutorService and invokes it multiple times with different set of arguments and parameters. I have set spark memory

Saving RDD with array of strings

2014-09-21 Thread Sarath Chandra
Hi All, If my RDD is having array/sequence of strings, how can I save them as a HDFS file with each string on separate line? For example if I write code as below, the output should get saved as hdfs file having one string per line ... ... var newLines = lines.map(line = myfunc(line));

Re: Task not serializable

2014-09-10 Thread Sarath Chandra
should show your code since it may not be doing what you think. If you instantiate an object, it happens every time your function is called. map() is called once per data element; mapPartitions() once per partition. It depends. On Wed, Sep 10, 2014 at 3:25 PM, Sarath Chandra sarathchandra.jos

Re: Task not serializable

2014-09-06 Thread Sarath Chandra
it to workers. In the second, you're creating SomeUnserializableManagerClass in the function and therefore on the worker. mapPartitions is better if this creation is expensive. On Fri, Sep 5, 2014 at 3:06 PM, Sarath Chandra sarathchandra.jos...@algofusiontech.com wrote: Hi, I'm trying

Task not serializable

2014-09-05 Thread Sarath Chandra
Hi, I'm trying to migrate a map-reduce program to work with spark. I migrated the program from Java to Scala. The map-reduce program basically loads a HDFS file and for each line in the file it applies several transformation functions available in various external libraries. When I execute this

Re: Task not serializable

2014-09-05 Thread Sarath Chandra
: You can bring those classes out of the library and Serialize it (implements Serializable). It is not the right way of doing it though it solved few of my similar problems. Thanks Best Regards On Fri, Sep 5, 2014 at 7:36 PM, Sarath Chandra sarathchandra.jos...@algofusiontech.com wrote

Re: Simple record matching using Spark SQL

2014-07-17 Thread Sarath Chandra
, Jul 17, 2014 at 1:13 PM, Sarath Chandra sarathchandra.jos...@algofusiontech.com wrote: No Sonal, I'm not doing any explicit call to stop context. If you see my previous post to Michael, the commented portion of the code is my requirement. When I run this over standalone spark cluster

Simple record matching using Spark SQL

2014-07-16 Thread Sarath Chandra
Hi All, I'm trying to do a simple record matching between 2 files and wrote following code - *import org.apache.spark.sql.SQLContext;* *import org.apache.spark.rdd.RDD* *object SqlTest {* * case class Test(fld1:String, fld2:String, fld3:String, fld4:String, fld4:String, fld5:Double,

Re: Simple record matching using Spark SQL

2014-07-16 Thread Sarath Chandra
are info messages. What else do I need check? ~Sarath On Wed, Jul 16, 2014 at 7:23 PM, Soumya Simanta soumya.sima...@gmail.com wrote: Check your executor logs for the output or if your data is not big collect it in the driver and print it. On Jul 16, 2014, at 9:21 AM, Sarath Chandra

Re: Simple record matching using Spark SQL

2014-07-16 Thread Sarath Chandra
On Wed, Jul 16, 2014 at 7:48 PM, Soumya Simanta soumya.sima...@gmail.com wrote: When you submit your job, it should appear on the Spark UI. Same with the REPL. Make sure you job is submitted to the cluster properly. On Wed, Jul 16, 2014 at 10:08 AM, Sarath Chandra sarathchandra.jos

Re: Simple record matching using Spark SQL

2014-07-16 Thread Sarath Chandra
at 7:59 PM, Soumya Simanta soumya.sima...@gmail.com wrote: Can you try submitting a very simple job to the cluster. On Jul 16, 2014, at 10:25 AM, Sarath Chandra sarathchandra.jos...@algofusiontech.com wrote: Yes it is appearing on the Spark UI, and remains there with state as RUNNING till

Re: Simple record matching using Spark SQL

2014-07-16 Thread Sarath Chandra
On Wed, Jul 16, 2014 at 8:14 PM, Michael Armbrust mich...@databricks.com wrote: What if you just run something like: *sc.textFile(hdfs://localhost:54310/user/hduser/file1.csv).count()* On Wed, Jul 16, 2014 at 10:37 AM, Sarath Chandra sarathchandra.jos...@algofusiontech.com wrote: Yes