RDD storage in spark steaming

2015-03-23 Thread abhi
HI,
i have a simple question about creating RDD . Whenever RDD is created in
spark streaming for the particular time window .When does the RDD gets
stored .

1. Does it get stored at the Driver machine ? or it gets stored on all the
machines in the cluster ?
2. Does the data gets stored in memory by default ? Can it store at the
memory and disk ? How can it configured ?


Thanks,
Abhi


Priority queue in spark

2015-03-16 Thread abhi
Hi
Current all the jobs in spark gets submitted using queue . i have a
requirement where submitted job will generate another set of jobs with some
priority , which should again be submitted to spark cluster based on
priority ? Means job with higher priority should be executed first,Is
it feasible  ?

Any help is appreciated ?

Thanks,
Abhi


Re: Priority queue in spark

2015-03-16 Thread abhi
If i understand correctly , the above document creates pool for priority
which is static in nature and has to be defined before submitting the job .
.in my scenario each generated task can have different priority.

Thanks,
Abhi


On Mon, Mar 16, 2015 at 9:48 PM, twinkle sachdeva 
twinkle.sachd...@gmail.com wrote:

 Hi,

 Maybe this is what you are looking for :
 http://spark.apache.org/docs/1.2.0/job-scheduling.html#fair-scheduler-pools

 Thanks,

 On Mon, Mar 16, 2015 at 8:15 PM, abhi abhishek...@gmail.com wrote:

 Hi
 Current all the jobs in spark gets submitted using queue . i have a
 requirement where submitted job will generate another set of jobs with some
 priority , which should again be submitted to spark cluster based on
 priority ? Means job with higher priority should be executed first,Is
 it feasible  ?

 Any help is appreciated ?

 Thanks,
 Abhi






Re: Priority queue in spark

2015-03-16 Thread abhi
yes .
Each generated job can have a different priority it is like a recursive
function, where in each iteration generate job will be submitted to the
spark cluster based on the priority.  jobs will lower priority or less than
some threshold will be discarded.

Thanks,
Abhi


On Mon, Mar 16, 2015 at 10:36 PM, twinkle sachdeva 
twinkle.sachd...@gmail.com wrote:

 Hi Abhi,

 You mean each task of a job can have different priority or job generated
 via one job can have different priority?



 On Tue, Mar 17, 2015 at 11:04 AM, Mark Hamstra m...@clearstorydata.com
 wrote:


 http://apache-spark-developers-list.1001551.n3.nabble.com/Job-priority-td10076.html#a10079

 On Mon, Mar 16, 2015 at 10:26 PM, abhi abhishek...@gmail.com wrote:

 If i understand correctly , the above document creates pool for priority
 which is static in nature and has to be defined before submitting the job .
 .in my scenario each generated task can have different priority.

 Thanks,
 Abhi


 On Mon, Mar 16, 2015 at 9:48 PM, twinkle sachdeva 
 twinkle.sachd...@gmail.com wrote:

 Hi,

 Maybe this is what you are looking for :
 http://spark.apache.org/docs/1.2.0/job-scheduling.html#fair-scheduler-pools

 Thanks,

 On Mon, Mar 16, 2015 at 8:15 PM, abhi abhishek...@gmail.com wrote:

 Hi
 Current all the jobs in spark gets submitted using queue . i have a
 requirement where submitted job will generate another set of jobs with 
 some
 priority , which should again be submitted to spark cluster based on
 priority ? Means job with higher priority should be executed first,Is
 it feasible  ?

 Any help is appreciated ?

 Thanks,
 Abhi









Re: Issue with yarn cluster - hangs in accepted state.

2015-03-15 Thread abhi
Thanks,
It worked.

-Abhi

On Tue, Mar 3, 2015 at 5:15 PM, Tobias Pfeiffer t...@preferred.jp wrote:

 Hi,

 On Wed, Mar 4, 2015 at 6:20 AM, Zhan Zhang zzh...@hortonworks.com wrote:

  Do you have enough resource in your cluster? You can check your resource
 manager to see the usage.


 Yep, I can confirm that this is a very annoying issue. If there is not
 enough memory or VCPUs available, your app will just stay in ACCEPTED state
 until resources are available.

 You can have a look at

 https://github.com/jubatus/jubaql-docker/blob/master/hadoop/yarn-site.xml#L35
 to see some settings that might help.

 Tobias





Issue with yarn cluster - hangs in accepted state.

2015-03-03 Thread abhi
I am trying to run below java class with yarn cluster, but it hangs in
accepted state . i don't see any error . Below is the class and command .
Any help is appreciated .


Thanks,

Abhi





bin/spark-submit --class com.mycompany.app.SimpleApp --master yarn-cluster
/home/hduser/my-app-1.0.jar


{code}

public class SimpleApp {

public static void main(String[] args) {

  String logFile = /home/hduser/testspark.txt; // Should be some file
on your system

  SparkConf conf = new SparkConf().setAppName(Simple Application);

  JavaSparkContext sc = new JavaSparkContext(conf);

  JavaRDDString logData = sc.textFile(logFile).cache();


  long numAs = logData.filter(new FunctionString, Boolean() {

public Boolean call(String s) { return s.contains(a); }

  }).count();


  long numBs = logData.filter(new FunctionString, Boolean() {

public Boolean call(String s) { return s.contains(b); }

  }).count();


  System.out.println(Lines with a:  + numAs + , lines with b:  +
numBs);

}

  }

{code}


15/03/03 11:47:40 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:41 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:42 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:43 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:44 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:45 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:46 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:47 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:48 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:49 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:50 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:51 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:52 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:53 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:54 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:55 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:56 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:57 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:58 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:47:59 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:48:00 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:48:01 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:48:02 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:48:03 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED)

15/03/03 11:48:04 INFO yarn.Client: Application report for
application_1425398386987_0002 (state: ACCEPTED


unsubscribe

2015-01-28 Thread Abhi Basu
-- 
Abhi Basu


SparkSQL

2015-01-08 Thread Abhi Basu
I am working with CDH5.2 (Spark 1.0.0) and wondering which version of Spark
comes with SparkSQL by default. Also, will SparkSQL come enabled to access
the Hive Metastore? Is there an easier way to enable Hive support without
have to build the code with various switches?

Thanks,

Abhi

-- 
Abhi Basu


Re: Building Desktop application for ALS-MlLib/ Training ALS

2014-12-15 Thread Abhi Basu
In case you must write c# code, you can call python code from c# or use
IronPython. :)

On Mon, Dec 15, 2014 at 12:04 PM, Xiangrui Meng men...@gmail.com wrote:

 On Sun, Dec 14, 2014 at 3:06 AM, Saurabh Agrawal
 saurabh.agra...@markit.com wrote:
 
 
  Hi,
 
 
 
  I am a new bee in spark and scala world
 
 
 
  I have been trying to implement Collaborative filtering using MlLib
 supplied
  out of the box with Spark and Scala
 
 
 
  I have 2 problems
 
 
 
  1.   The best model was trained with rank = 20 and lambda = 5.0, and
  numIter = 10, and its RMSE on the test set is 25.718710831912485. The
 best
  model improves the baseline by 18.29%. Is there a scientific way in which
  RMSE could be brought down? What is a descent acceptable value for RMSE?
 

 The grid search approach used in the AMPCamp tutorial is pretty
 standard. Whether an RMSE is good or not really depends on your
 dataset.

  2.   I picked up the Collaborative filtering algorithm from
 
 http://ampcamp.berkeley.edu/5/exercises/movie-recommendation-with-mllib.html
  and executed the given code with my dataset. Now, I want to build a
 desktop
  application around it.
 
  a.   What is the best language to do this Java/ Scala? Any
 possibility
  to do this using C#?
 

 We support Java/Scala/Python. Start with the one your are most
 familiar with. C# is not supported.

  b.  Can somebody please share any relevant documents/ source or any
  helper links to help me get started on this?
 

 For ALS, you can check the API documentation.

 
 
  Your help is greatly appreciated
 
 
 
  Thanks!!
 
 
 
  Regards,
 
  Saurabh Agrawal
 
 
  
  This e-mail, including accompanying communications and attachments, is
  strictly confidential and only for the intended recipient. Any retention,
  use or disclosure not expressly authorised by Markit is prohibited. This
  email is subject to all waivers and other terms at the following link:
  http://www.markit.com/en/about/legal/email-disclaimer.page
 
  Please visit http://www.markit.com/en/about/contact/contact-us.page? for
  contact information on our offices worldwide.
 
  MarkitSERV Limited has its registered office located at Level 4,
 Ropemaker
  Place, 25 Ropemaker Street, London, EC2Y 9LY and is authorized and
 regulated
  by the Financial Conduct Authority with registration number 207294

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



-- 
Abhi Basu


Re: sbt assembly with hive

2014-12-12 Thread Abhi Basu
I am getting the same message when trying to get HIveContext in CDH 5.1
after enabling Spark. I am thinking Spark should come with Hive enabled
(default option) as Hive metastore is a common way to share data, due to
popularity of Hive and other SQL-Over-Hadoop technologies like Impala.

Thanks,

Abhi

On Fri, Dec 12, 2014 at 6:40 PM, Stephen Boesch java...@gmail.com wrote:


 What is the proper way to build with hive from sbt?  The SPARK_HIVE is
 deprecated. However after running the following:

sbt -Pyarn -Phadoop-2.3 -Phive  assembly/assembly

 And then
   bin/pyspark

hivectx = HiveContext(sc)

hivectx.hiveql(select * from my_table)

 Exception: (You must build Spark with Hive. Export 'SPARK_HIVE=true' and
 run sbt/sbt assembly, Py4JError(u'Trying to call a package.',))



-- 
Abhi Basu