Can we make EdgeRDD and VertexRDD storage level to MEMORY_AND_DISK?

2014-11-19 Thread Harihar Nahak
Hi, 

I'm running out of memory when I run a GraphX program for dataset moe than
10 GB, It was handle pretty well in case of noraml spark operation when did
StorageLevel.MEMORY_AND_DISK. 

In case of GraphX I found its only allowed storing in memory, and it is
because in Graph constructor, this property set by default. When I changed
storage level as per my requirement,  it doesn't allow and throw Error
Message sayinh Cannot Modify StorageLevel when Its already set 

Please help me on these queries : 
1  How to override current staorge level to MEMORY and DISK ? 
2  If its not possible through constructor, what If I modify Graph.scala
class and rebuild it to make it work? By applying this, is there any other
things I need know? 

Thanks   



-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Can-we-make-EdgeRDD-and-VertexRDD-storage-level-to-MEMORY-AND-DISK-tp19307.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



How to get list of edges between two Vertex ?

2014-11-19 Thread Harihar Nahak
Hi, 
I have a graph where no. of edges b/w two vertices are more than once
possible. Now I need to find out who are top vertices between which no. of
calls happen more? 

output should look like (V1, V2 , No. of edges)   
So I need to know, how to find out total no. of edges b/w only that two
vertices. 




-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-list-of-edges-between-two-Vertex-tp19309.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Can we make EdgeRDD and VertexRDD storage level to MEMORY_AND_DISK?

2014-11-19 Thread Harihar Nahak
Just figured it out using Graph constructor you can pass the storage level
for both Edge and Vertex : 
Graph.fromEdges(edges, defaultValue =
(,),StorageLevel.MEMORY_AND_DISK,StorageLevel.MEMORY_AND_DISK ) 

Thanks to this post : https://issues.apache.org/jira/browse/SPARK-1991  



-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Can-we-make-EdgeRDD-and-VertexRDD-storage-level-to-MEMORY-AND-DISK-tp19307p19335.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to join two RDDs with mutually exclusive keys

2014-11-20 Thread Harihar Nahak
I've similar type of issue, want to join two different type of RDD in one
RDD

file1.txt content (ID, counts)
val x : RDD[Long, Int] = sc.textFile(file1.txt).map( line =
line.split(,)).map(row = (row(0).toLong, row(1).toInt)
[(4407 ,40),
(2064, 38),
(7815 ,10),
(5736,17),
(8031,3)]

Second RDD from : file2.txt contains (ID, name)
val y: RDD[(Long, String)]{where ID is common in both the RDDs}
[(4407 ,Jhon),
(2064, Maria),
(7815 ,Casto),
(5736,Ram),
(8031,XYZ)]

and I'm expecting result should be like this : [(ID, Name, Count)]
[(4407 ,Jhon, 40),
(2064, Maria, 38),
(7815 ,Casto, 10),
(5736,Ram, 17),
(8031,XYZ, 3)]


Any help will really appreciate. Thanks




On 21 November 2014 09:18, dsiegmann [via Apache Spark User List] 
ml-node+s1001560n19419...@n3.nabble.com wrote:

 You want to use RDD.union (or SparkContext.union for many RDDs). These
 don't join on a key. Union doesn't really do anything itself, so it is low
 overhead. Note that the combined RDD will have all the partitions of the
 original RDDs, so you may want to coalesce after the union.

 val x = sc.parallelize(Seq( (1, 3), (2, 4) ))
 val y = sc.parallelize(Seq( (3, 5), (4, 7) ))
 val z = x.union(y)

 z.collect
 res0: Array[(Int, Int)] = Array((1,3), (2,4), (3,5), (4,7))


 On Thu, Nov 20, 2014 at 3:06 PM, Blind Faith [hidden email]
 http://user/SendEmail.jtp?type=nodenode=19419i=0 wrote:

 Say I have two RDDs with the following values

 x = [(1, 3), (2, 4)]

 and

 y = [(3, 5), (4, 7)]

 and I want to have

 z = [(1, 3), (2, 4), (3, 5), (4, 7)]

 How can I achieve this. I know you can use outerJoin followed by map to
 achieve this, but is there a more direct way for this.




 --
 Daniel Siegmann, Software Developer
 Velos
 Accelerating Machine Learning

 54 W 40th St, New York, NY 10018
 E: [hidden email] http://user/SendEmail.jtp?type=nodenode=19419i=1 W:
 www.velos.io


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/How-to-join-two-RDDs-with-mutually-exclusive-keys-tp19417p19419.html
  To start a new topic under Apache Spark User List, email
 ml-node+s1001560n1...@n3.nabble.com
 To unsubscribe from Apache Spark User List, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=aG5haGFrQHd5bnlhcmRncm91cC5jb218MXwtMTgxOTE5MTkyOQ==
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-- 
Regards,
Harihar Nahak
BigData Developer
Wynyard
Email:hna...@wynyardgroup.com | Extn: 8019




-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-join-two-RDDs-with-mutually-exclusive-keys-tp19417p19423.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to join two RDDs with mutually exclusive keys

2014-11-20 Thread Harihar Nahak
Thanks Daniel , 

Applied Join from PairedRDD 

 val countByUsername = file1.join(file2)

.map {
case (id, (username, count)) = (id, username, count)
}  





-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-join-two-RDDs-with-mutually-exclusive-keys-tp19417p19431.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Is Spark? or GraphX runs fast? a performance comparison on Page Rank

2014-11-24 Thread Harihar Nahak
Hi All, 

I started exploring Spark from past 2 months. I'm looking for some concrete
features from both Spark and GraphX so that I'll take some decisions what to
use, based upon who get highest performance. 

According to documentation GraphX runs 10x faster than normal Spark. So I
run Page Rank algorithm in both the applications: 
For Spark I used:
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/SparkPageRank.scala
For GraphX I used :
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/graphx/LiveJournalPageRank.scala
  

Input data : http://snap.stanford.edu/data/soc-LiveJournal1.html (1 Gb in
size)
No of Iterations : 2 

*Time Taken : *

Local Mode (Machine : 8 Core; 16 GB memory; 2.80 Ghz Intel i7; Executor
Memory: 4Gb, No. of Partition: 50; No. of Iterations: 2);   ==  

*Spark Page Rank took - 21.29 mins 
GraphX Page Rank took - 42.01 mins *   
 
Cluster Mode (ubantu 12.4; spark 1.1/hadoop 2.4 cluster ; 3 workers , 1
driver , 8 cores, 30 gb memory) (Executor memory 4gb; No. of edge partitions
: 50, random vertex cut ; no. of iteration : 2) =

*Spark Page Rank took - 10.54 mins 
GraphX Page Rank took - 7.54 mins * 


Could you please help me to determine, when to use Spark and GraphX ? If
GraphX took same amount of time than Spark then its better to use Spark
because spark has variey of operators to deal with any type of RDD. 

Any suggestions or feedback or pointers will highly appreciate

Thanks,


 



-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-or-GraphX-runs-fast-a-performance-comparison-on-Page-Rank-tp19710.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Is there a way to turn on spark eventLog on the worker node?

2014-11-24 Thread Harihar Nahak
You can set the same parameter when launching an application, if you use
sppar-submit tried --conf to give those variables or from SparkConfig also
you can set the logs for both driver and workers.   



-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-a-way-to-turn-on-spark-eventLog-on-the-worker-node-tp19714p19716.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Configuring custom input format

2014-11-25 Thread Harihar Nahak
Hi, 

I'm trying to make custom input format for CSV file, if you can share little
bit more what you read as input and what things you have implemented. I'll
try to replicate the same things. If I find something interesting at my end
I'll let you know. 

Thanks,
Harihar



-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-custom-input-format-tp18220p19800.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Is Spark? or GraphX runs fast? a performance comparison on Page Rank

2014-11-26 Thread Harihar Nahak
Hi Guys, 

is there any one experience the same thing as above?  



-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-or-GraphX-runs-fast-a-performance-comparison-on-Page-Rank-tp19710p19909.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Is Spark? or GraphX runs fast? a performance comparison on Page Rank

2014-11-27 Thread Harihar Nahak
Thanks Ankur, Its really help full. I've few queries on optimization
techniques. for the current I used RandomVertexCut partition.

But what partition should be used if have:
1. No. of edges in edgeList file are to large like 50,000,000; where
multiple edges to same pair of vertices are many
2. No of unique Vertex are to large suppose 10,000,000 in above edgeList
file
3. No of unique Vertex are small suppose less than 100,000 in above
edgeList file





On 27 November 2014 at 20:23, ankurdave [via Apache Spark User List] 
ml-node+s1001560n1995...@n3.nabble.com wrote:

 At 2014-11-24 19:02:08 -0800, Harihar Nahak [hidden email]
 http://user/SendEmail.jtp?type=nodenode=19956i=0 wrote:

  According to documentation GraphX runs 10x faster than normal Spark. So
 I
  run Page Rank algorithm in both the applications:
  [...]
  Local Mode (Machine : 8 Core; 16 GB memory; 2.80 Ghz Intel i7; Executor
  Memory: 4Gb, No. of Partition: 50; No. of Iterations: 2);   ==
 
  *Spark Page Rank took - 21.29 mins
  GraphX Page Rank took - 42.01 mins *
 
  Cluster Mode (ubantu 12.4; spark 1.1/hadoop 2.4 cluster ; 3 workers , 1
  driver , 8 cores, 30 gb memory) (Executor memory 4gb; No. of edge
 partitions
  : 50, random vertex cut ; no. of iteration : 2) =
 
  *Spark Page Rank took - 10.54 mins
  GraphX Page Rank took - 7.54 mins *
 
  Could you please help me to determine, when to use Spark and GraphX ? If
  GraphX took same amount of time than Spark then its better to use Spark
  because spark has variey of operators to deal with any type of RDD.

 If you have a problem that's naturally expressible as a graph computation,
 it makes sense to use GraphX in my opinion. In addition to the
 optimizations that GraphX incorporates which you would otherwise have to
 implement manually, GraphX's programming model is likely a better fit. But
 even if you start off by using pure Spark, you'll still have the
 flexibility to use GraphX for other parts of the problem since it's part of
 the same system.

 To address the benchmark results you got:

 1. GraphX takes more time than Spark to load the graph, because it has to
 index it, but subsequent iterations should be faster. We benchmarked with
 20 iterations to show this effect, but you only used 2 iterations, which
 doesn't give much time to amortize the loading cost.

 2. The benchmarks in the GraphX OSDI paper are against a naive
 implementation of PageRank in Spark, while the version you benchmarked
 against has some of the same optimizations as GraphX does. I believe we
 found that the optimized Spark PageRank was only 3x slower than GraphX.

 3. When running those benchmarks, we used an experimental version of Spark
 with in-memory shuffle, which disproportionately benefits GraphX since its
 shuffle files are smaller due to specialized compression.

 4. We haven't optimized GraphX for local mode, so it's not surprising that
 it's slower there.

 Ankur

 -
 To unsubscribe, e-mail: [hidden email]
 http://user/SendEmail.jtp?type=nodenode=19956i=1
 For additional commands, e-mail: [hidden email]
 http://user/SendEmail.jtp?type=nodenode=19956i=2



 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-or-GraphX-runs-fast-a-performance-comparison-on-Page-Rank-tp19710p19956.html
  To start a new topic under Apache Spark User List, email
 ml-node+s1001560n1...@n3.nabble.com
 To unsubscribe from Is Spark? or GraphX runs fast? a performance
 comparison on Page Rank, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=19710code=aG5haGFrQHd5bnlhcmRncm91cC5jb218MTk3MTB8LTE4MTkxOTE5Mjk=
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-- 
Regards,
Harihar Nahak
BigData Developer
Wynyard
Email:hna...@wynyardgroup.com | Extn: 8019




-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-or-GraphX-runs-fast-a-performance-comparison-on-Page-Rank-tp19710p19986.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Lifecycle of RDD in spark-streaming

2014-11-27 Thread Harihar Nahak
When there is new data comes in a stream spark use streams classes to
convert it into RDD and as you mention its follow with transformation and
finally action. Till the time user doesn't destroy or application is alive
All RDD remain in Memory as far as I experienced.


On 26 November 2014 at 20:05, Mukesh Jha [via Apache Spark User List] 
ml-node+s1001560n19835...@n3.nabble.com wrote:

 Any pointers guys?

 On Tue, Nov 25, 2014 at 5:32 PM, Mukesh Jha [hidden email]
 http://user/SendEmail.jtp?type=nodenode=19835i=0 wrote:

 Hey Experts,

 I wanted to understand in detail about the lifecycle of rdd(s) in a
 streaming app.

 From my current understanding
 - rdd gets created out of the realtime input stream.
 - Transform(s) functions are applied in a lazy fashion on the RDD to
 transform into another rdd(s).
 - Actions are taken on the final transformed rdds to get the data out of
 the system.

 Also rdd(s) are stored in the clusters RAM (disc if configured so) and
 are cleaned in LRU fashion.

 So I have the following questions on the same.
 - How spark (streaming) guarantees that all the actions are taken on each
 input rdd/batch.
 - How does spark determines that the life-cycle of a rdd is complete. Is
 there any chance that a RDD will be cleaned out of ram before all actions
 are taken on them?

 Thanks in advance for all your help. Also, I'm relatively new to scala 
 spark so pardon me in case these are naive questions/assumptions.

 --
 Thanks  Regards,

 *[hidden email] http://user/SendEmail.jtp?type=nodenode=19835i=1*




 --


 Thanks  Regards,

 *[hidden email] http://user/SendEmail.jtp?type=nodenode=19835i=2*


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/Lifecycle-of-RDD-in-spark-streaming-tp19749p19835.html
  To start a new topic under Apache Spark User List, email
 ml-node+s1001560n1...@n3.nabble.com
 To unsubscribe from Apache Spark User List, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=aG5haGFrQHd5bnlhcmRncm91cC5jb218MXwtMTgxOTE5MTkyOQ==
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-- 
Regards,
Harihar Nahak
BigData Developer
Wynyard
Email:hna...@wynyardgroup.com | Extn: 8019




-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Lifecycle-of-RDD-in-spark-streaming-tp19749p19987.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Multiple SparkContexts in same Driver JVM

2014-11-30 Thread Harihar Nahak
try setting in SparkConf.set( 'spark.driver.allowMultipleContexts' , true)

On 30 November 2014 at 17:37, lokeshkumar [via Apache Spark User List] 
ml-node+s1001560n20037...@n3.nabble.com wrote:

 Hi Forum,

 Is it not possible to run multiple SparkContexts concurrently without
 stopping the other one in the spark 1.3.0.
 I have been trying this out and getting the below error.

 Caused by: org.apache.spark.SparkException: Only one SparkContext may be
 running in this JVM (see SPARK-2243). To ignore this error, set
 spark.driver.allowMultipleContexts = true. The currently running
 SparkContext was created at:

 According to this, its not possible to create unless we specify the option
 spark.driver.allowMultipleContexts = true.

 So is there a way to create multiple concurrently running SparkContext in
 same JVM or should we trigger Driver processes in different JVMs to do the
 same?

 Also please let me know where the option
 'spark.driver.allowMultipleContexts' to be set? I have set it in
 spark-env.sh SPARK_MASTER_OPTS but no luck.

 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/Multiple-SparkContexts-in-same-Driver-JVM-tp20037.html
  To start a new topic under Apache Spark User List, email
 ml-node+s1001560n1...@n3.nabble.com
 To unsubscribe from Apache Spark User List, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=aG5haGFrQHd5bnlhcmRncm91cC5jb218MXwtMTgxOTE5MTkyOQ==
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-- 
Regards,
Harihar Nahak
BigData Developer
Wynyard
Email:hna...@wynyardgroup.com | Extn: 8019




-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Multiple-SparkContexts-in-same-Driver-JVM-tp20037p20055.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: RDDs join problem: incorrect result

2014-11-30 Thread Harihar Nahak
what do you mean by incorrect? could you please share some examples from
both the RDD and resultant RDD also If you get any exception paste that
too. it helps to debug where is the issue

On 27 November 2014 at 17:07, liuboya [via Apache Spark User List] 
ml-node+s1001560n19928...@n3.nabble.com wrote:

 Hi,
I ran into a problem when doing two RDDs join operation. For example,
 RDDa: RDD[(String,String)] and RDDb:RDD[(String,Int)]. Then, the result
 RDDc:[String,(String,Int)] = RDDa.join(RDDb). But I find the results in
 RDDc are  incorrect compared with RDDb. What's wrong in join?

 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-join-problem-incorrect-result-tp19928.html
  To start a new topic under Apache Spark User List, email
 ml-node+s1001560n1...@n3.nabble.com
 To unsubscribe from Apache Spark User List, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=aG5haGFrQHd5bnlhcmRncm91cC5jb218MXwtMTgxOTE5MTkyOQ==
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-- 
Regards,
Harihar Nahak
BigData Developer
Wynyard
Email:hna...@wynyardgroup.com | Extn: 8019




-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-join-problem-incorrect-result-tp19928p20056.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: GraphX:java.lang.NoSuchMethodError:org.apache.spark.graphx.Graph$.apply

2014-11-30 Thread Harihar Nahak
Hi, If you haven't figure out so far; could you please share some details:
how you running GraphX ?

also before executing above commands from shell import required GraphX
packages

On 27 November 2014 at 20:49, liuboya [via Apache Spark User List] 
ml-node+s1001560n19959...@n3.nabble.com wrote:

 I'm waiting online. Who can help me, please?

 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-java-lang-NoSuchMethodError-org-apache-spark-graphx-Graph-apply-tp19958p19959.html
  To start a new topic under Apache Spark User List, email
 ml-node+s1001560n1...@n3.nabble.com
 To unsubscribe from Apache Spark User List, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=aG5haGFrQHd5bnlhcmRncm91cC5jb218MXwtMTgxOTE5MTkyOQ==
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-- 
Regards,
Harihar Nahak
BigData Developer
Wynyard
Email:hna...@wynyardgroup.com | Extn: 8019




-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-java-lang-NoSuchMethodError-org-apache-spark-graphx-Graph-apply-tp19958p20057.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Edge List File in GraphX

2014-11-30 Thread Harihar Nahak
Graphloade.edgeListFile(fileName) , where file name must be in 1\t2  form.
about result NaN there might some issue with the data. I ran it for various
combination of data set and it works perfectly fine.

On 25 November 2014 at 19:23, pradhandeep [via Apache Spark User List] 
ml-node+s1001560n1972...@n3.nabble.com wrote:

 Hi,
 Is it necessary for every vertex to have an attribute when we load a graph
 to GraphX?
 In other words, if I have an edge list file containing pairs of vertices
 i.e., 1   2 means that there is an edge between node 1 and node 2. Now,
 when I run PageRank on this data it return a NaN.
 Can I use this type of data for any algorithm on GraphX?

 Thank You


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/Edge-List-File-in-GraphX-tp19724.html
  To start a new topic under Apache Spark User List, email
 ml-node+s1001560n1...@n3.nabble.com
 To unsubscribe from Apache Spark User List, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=aG5haGFrQHd5bnlhcmRncm91cC5jb218MXwtMTgxOTE5MTkyOQ==
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-- 
Regards,
Harihar Nahak
BigData Developer
Wynyard
Email:hna...@wynyardgroup.com | Extn: 8019




-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Edge-List-File-in-GraphX-tp19724p20060.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: NumberFormatException

2014-12-15 Thread Harihar Nahak
Hi Yu,

Try this :
val data = csv.map( line = line.split(,).map(elem = elem.trim)) //lines
in rows

   data.map( rec = (rec(0).toInt, rec(1).toInt))

to convert into integer.

On 16 December 2014 at 10:49, yu [via Apache Spark User List] 
ml-node+s1001560n20694...@n3.nabble.com wrote:

 Hello, everyone

 I know 'NumberFormatException' is due to the reason that String can not be
 parsed properly, but I really can not find any mistakes for my code. I hope
 someone may kindly help me.
 My hdfs file is as follows:
 8,22
 3,11
 40,10
 49,47
 48,29
 24,28
 50,30
 33,56
 4,20
 30,38
 ...

 So each line contains an integer + , + an integer + \n
 My code is as follows:
 object StreamMonitor {
   def main(args: Array[String]): Unit = {
 val myFunc = (str: String) = {
   val strArray = str.trim().split(,)
   (strArray(0).toInt, strArray(1).toInt)
 }
 val conf = new SparkConf().setAppName(StreamMonitor);
 val ssc = new StreamingContext(conf, Seconds(30));
 val datastream = ssc.textFileStream(/user/yu/streaminput);
 val newstream = datastream.map(myFunc)
 newstream.saveAsTextFiles(output/, );
 ssc.start()
 ssc.awaitTermination()
   }

 }

 The exception info is:
 14/12/15 15:35:03 WARN scheduler.TaskSetManager: Lost task 0.0 in stage
 0.0 (TID 0, h3): java.lang.NumberFormatException: For input string: 8

 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)

 java.lang.Integer.parseInt(Integer.java:492)
 java.lang.Integer.parseInt(Integer.java:527)

 scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
 scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
 StreamMonitor$$anonfun$1.apply(StreamMonitor.scala:9)
 StreamMonitor$$anonfun$1.apply(StreamMonitor.scala:7)
 scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

 org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:984)


 org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:974)

 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
 org.apache.spark.scheduler.Task.run(Task.scala:54)

 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)


 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

 java.lang.Thread.run(Thread.java:745)

 So based on the above info, 8 is the first number in the file and I
 think it should be parsed to integer without any problems.
 I know it may be a very stupid question and the answer may be very easy.
 But I really can not find the reason. I am thankful to anyone who helps!

 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/NumberFormatException-tp20694.html
  To start a new topic under Apache Spark User List, email
 ml-node+s1001560n1...@n3.nabble.com
 To unsubscribe from Apache Spark User List, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=aG5haGFrQHd5bnlhcmRncm91cC5jb218MXwtMTgxOTE5MTkyOQ==
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



-- 
Regards,
Harihar Nahak
BigData Developer
Wynyard
Email:hna...@wynyardgroup.com | Extn: 8019




-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/NumberFormatException-tp20694p20696.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: RDDs being cleaned too fast

2014-12-16 Thread Harihar Nahak
RDD.persist() can be useful here.

On 11 December 2014 at 14:34, ankits [via Apache Spark User List] 
ml-node+s1001560n20613...@n3.nabble.com wrote:

 I'm using spark 1.1.0 and am seeing persisted RDDs being cleaned up too
 fast. How can i inspect the size of RDD in memory and get more information
 about why it was cleaned up. There should be more than enough memory
 available on the cluster to store them, and by default, the
 spark.cleaner.ttl is infinite, so I want more information about why this is
 happening and how to prevent it.

 Spark just logs this when removing RDDs:

 [2014-12-11 01:19:34,006] INFO  spark.storage.BlockManager [] [] -
 Removing RDD 33
 [2014-12-11 01:19:34,010] INFO  pache.spark.ContextCleaner []
 [akka://JobServer/user/context-supervisor/job-context1] - Cleaned RDD 33
 [2014-12-11 01:19:34,012] INFO  spark.storage.BlockManager [] [] -
 Removing RDD 33
 [2014-12-11 01:19:34,016] INFO  pache.spark.ContextCleaner []
 [akka://JobServer/user/context-supervisor/job-context1] - Cleaned RDD 33

 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-being-cleaned-too-fast-tp20613.html
  To start a new topic under Apache Spark User List, email
 ml-node+s1001560n1...@n3.nabble.com
 To unsubscribe from Apache Spark User List, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=aG5haGFrQHd5bnlhcmRncm91cC5jb218MXwtMTgxOTE5MTkyOQ==
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



-- 
Regards,
Harihar Nahak
BigData Developer
Wynyard
Email:hna...@wynyardgroup.com | Extn: 8019




-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-being-cleaned-too-fast-tp20613p20738.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: hello

2014-12-18 Thread Harihar Nahak
You mean to Spark User List, Its pretty easy. check the first  email it has
all instructions

On 18 December 2014 at 21:56, csjtx1021 [via Apache Spark User List] 
ml-node+s1001560n20759...@n3.nabble.com wrote:

 i want to join you

 --
  If you reply to this email, your message will be added to the discussion
 below:
 http://apache-spark-user-list.1001560.n3.nabble.com/hello-tp20759.html
  To start a new topic under Apache Spark User List, email
 ml-node+s1001560n1...@n3.nabble.com
 To unsubscribe from Apache Spark User List, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=aG5haGFrQHd5bnlhcmRncm91cC5jb218MXwtMTgxOTE5MTkyOQ==
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



-- 
Regards,
Harihar Nahak
BigData Developer
Wynyard
Email:hna...@wynyardgroup.com | Extn: 8019




-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/hello-tp20759p20770.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark GraphX question.

2014-12-18 Thread Harihar Nahak
Hi Ted,

I've no idea what is Transitive Reduction but the expected result you can
achieve by graph.subgraph(graph.edges.filter()) syntax and which filter
edges by its weight and give you new graph as per your condition.

On 19 December 2014 at 11:11, Tae-Hyuk Ahn [via Apache Spark User List] 
ml-node+s1001560n20768...@n3.nabble.com wrote:

 Hi All,

 I am wondering what is the best way to remove transitive edges with
 maximum spanning tree. For example,

 Edges:
 1 - 2 (30)
 2 - 3 (30)
 1 - 3 (25)

 where parenthesis is a weight for each edge.

 Then, I'd like to get the reduced edges graph after Transitive Reduction
 with considering the weight as a maximum spanning tree.

 Edges:
 1 - 2 (30)
 2 - 3 (30)

 Do you have a good idea for this?

 Thanks,

 Ted


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-GraphX-question-tp20768.html
  To start a new topic under Apache Spark User List, email
 ml-node+s1001560n1...@n3.nabble.com
 To unsubscribe from Apache Spark User List, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=aG5haGFrQHd5bnlhcmRncm91cC5jb218MXwtMTgxOTE5MTkyOQ==
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



-- 
Regards,
Harihar Nahak
BigData Developer
Wynyard
Email:hna...@wynyardgroup.com | Extn: 8019




-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-GraphX-question-tp20768p20771.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Is Spark? or GraphX runs fast? a performance comparison on Page Rank

2014-12-28 Thread Harihar Nahak
Yes, I had try that too. I took the pre-built spark 1.1 release. If you there
are changes in up coming changes for GraphX library, just let me know or in
spark 1.2 I can do try on that. 

--Harihar



-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-or-GraphX-runs-fast-a-performance-comparison-on-Page-Rank-tp19710p20874.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Results never return to driver | Spark Custom Reader

2015-01-22 Thread Harihar Nahak
Hi All, 

I wrote a custom reader to read a DB, and it is able to return key and value
as expected but after it finished it never returned to driver 

here is output of worker log : 
15/01/23 15:51:38 INFO worker.ExecutorRunner: Launch command: java -cp
::/usr/local/spark-1.2.0-bin-hadoop2.4/sbin/../conf:/usr/local/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar:/usr/local/spark-1.2.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar:/usr/local/spark-1.2.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark-1.2.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/usr/local/hadoop/etc/hadoop
-XX:MaxPermSize=128m -Dspark.driver.port=53484 -Xms1024M -Xmx1024M
org.apache.spark.executor.CoarseGrainedExecutorBackend
akka.tcp://sparkDriver@VM90:53484/user/CoarseGrainedScheduler 6 VM99
4 app-20150123155114-
akka.tcp://sparkWorker@VM99:44826/user/Worker
15/01/23 15:51:47 INFO worker.Worker: Executor app-20150123155114-/6
finished with state EXITED message Command exited with code 1 exitStatus 1
15/01/23 15:51:47 WARN remote.ReliableDeliverySupervisor: Association with
remote system [akka.tcp://sparkExecutor@VM99:57695] has failed, address is
now gated for [5000] ms. Reason is: [Disassociated].
15/01/23 15:51:47 INFO actor.LocalActorRef: Message
[akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from
Actor[akka://sparkWorker/deadLetters] to
Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%40143.96.25.29%3A35065-4#-915179653]
was not delivered. [3] dead letters encountered. This logging can be turned
off or adjusted with configuration settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
15/01/23 15:51:49 INFO worker.Worker: Asked to kill unknown executor
app-20150123155114-/6

If someone noticed any clue to fixed that will really appreciate. 



-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Results-never-return-to-driver-Spark-Custom-Reader-tp21328.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Data Locality

2015-01-28 Thread Harihar Nahak
Hi Guys, 

I have the similar question and doubt. How spark create an executor on the
same node where is data block stored? Does it first take information from
HDFS name mode, get the block information and then place executor on the
same node is spark-worker demon is installed?   

 
  



-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Data-Locality-tp21000p21410.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Eclipse on spark

2015-01-25 Thread Harihar Nahak
Download pre build binary for window and attached all required jars in your
project eclipsclass-path and go head with your eclipse. make sure you have
same java version

On 25 January 2015 at 07:33, riginos [via Apache Spark User List] 
ml-node+s1001560n21350...@n3.nabble.com wrote:

 How to compile a Spark project in Scala IDE for Eclipse? I got many scala
 scripts and i no longer want to load them from scala-shell what can i do?

 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/Eclipse-on-spark-tp21350.html
  To start a new topic under Apache Spark User List, email
 ml-node+s1001560n1...@n3.nabble.com
 To unsubscribe from Apache Spark User List, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=aG5haGFrQHd5bnlhcmRncm91cC5jb218MXwtMTgxOTE5MTkyOQ==
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-- 
Regards,
Harihar Nahak
BigData Developer
Wynyard
Email:hna...@wynyardgroup.com | Extn: 8019




-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Eclipse-on-spark-tp21350p21359.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Results never return to driver | Spark Custom Reader

2015-01-25 Thread Harihar Nahak
 directory location but in that case I'd assume you know
 where to find the log)

 On Thu, Jan 22, 2015 at 10:54 PM, Harihar Nahak hna...@wynyardgroup.com
 wrote:

 Hi All,

 I wrote a custom reader to read a DB, and it is able to return key and
 value
 as expected but after it finished it never returned to driver

 here is output of worker log :
 15/01/23 15:51:38 INFO worker.ExecutorRunner: Launch command: java -cp

 ::/usr/local/spark-1.2.0-bin-hadoop2.4/sbin/../conf:/usr/local/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar:/usr/local/spark-1.2.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar:/usr/local/spark-1.2.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark-1.2.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/usr/local/hadoop/etc/hadoop
 -XX:MaxPermSize=128m -Dspark.driver.port=53484 -Xms1024M -Xmx1024M
 org.apache.spark.executor.CoarseGrainedExecutorBackend
 akka.tcp://sparkDriver@VM90:53484/user/CoarseGrainedScheduler 6
 VM99
 4 app-20150123155114-
 akka.tcp://sparkWorker@VM99:44826/user/Worker
 15/01/23 15:51:47 INFO worker.Worker: Executor app-20150123155114-/6
 finished with state EXITED message Command exited with code 1 exitStatus 1
 15/01/23 15:51:47 WARN remote.ReliableDeliverySupervisor: Association with
 remote system [akka.tcp://sparkExecutor@VM99:57695] has failed, address
 is
 now gated for [5000] ms. Reason is: [Disassociated].
 15/01/23 15:51:47 INFO actor.LocalActorRef: Message
 [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from
 Actor[akka://sparkWorker/deadLetters] to

 Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%40143.96.25.29%3A35065-4#-915179653]
 was not delivered. [3] dead letters encountered. This logging can be
 turned
 off or adjusted with configuration settings 'akka.log-dead-letters' and
 'akka.log-dead-letters-during-shutdown'.
 15/01/23 15:51:49 INFO worker.Worker: Asked to kill unknown executor
 app-20150123155114-/6

 If someone noticed any clue to fixed that will really appreciate.



 -
 --Harihar
 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Results-never-return-to-driver-Spark-Custom-Reader-tp21328.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





-- 
Regards,
Harihar Nahak
BigData Developer
Wynyard
Email:hna...@wynyardgroup.com | Extn: 8019


Re: connector for CouchDB

2015-01-29 Thread Harihar Nahak
No, I changed it to MongoDB. but you can write you custom code to connect
couchDB directly but in market there is no such connector available.

with few classes extends you can achieve to read couch DB. I can help you
in that let me know if you really interested.

On 30 January 2015 at 06:46, prateek arora [via Apache Spark User List] 
ml-node+s1001560n21422...@n3.nabble.com wrote:

 I am also looking for connector for CouchDB in Spark. did you find
 anything ?

 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/connector-for-CouchDB-tp18630p21422.html
  To start a new topic under Apache Spark User List, email
 ml-node+s1001560n1...@n3.nabble.com
 To unsubscribe from connector for CouchDB, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=18630code=aG5haGFrQHd5bnlhcmRncm91cC5jb218MTg2MzB8LTE4MTkxOTE5Mjk=
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/connector-for-CouchDB-tp18630p21426.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.