Re: connector for CouchDB

2015-01-29 Thread Harihar Nahak
No, I changed it to MongoDB. but you can write you custom code to connect couchDB directly but in market there is no such connector available. with few classes extends you can achieve to read couch DB. I can help you in that let me know if you really interested. On 30 January 2015 at 06:46,

Re: Data Locality

2015-01-28 Thread Harihar Nahak
Hi Guys, I have the similar question and doubt. How spark create an executor on the same node where is data block stored? Does it first take information from HDFS name mode, get the block information and then place executor on the same node is spark-worker demon is installed? -

Re: Eclipse on spark

2015-01-25 Thread Harihar Nahak
-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- Regards, Harihar Nahak BigData Developer Wynyard Email:hna...@wynyardgroup.com | Extn: 8019 - --Harihar -- View this message

Re: Results never return to driver | Spark Custom Reader

2015-01-25 Thread Harihar Nahak
at 10:54 PM, Harihar Nahak hna...@wynyardgroup.com wrote: Hi All, I wrote a custom reader to read a DB, and it is able to return key and value as expected but after it finished it never returned to driver here is output of worker log : 15/01/23 15:51:38 INFO worker.ExecutorRunner: Launch

Results never return to driver | Spark Custom Reader

2015-01-22 Thread Harihar Nahak
Hi All, I wrote a custom reader to read a DB, and it is able to return key and value as expected but after it finished it never returned to driver here is output of worker log : 15/01/23 15:51:38 INFO worker.ExecutorRunner: Launch command: java -cp

Re: Is Spark? or GraphX runs fast? a performance comparison on Page Rank

2014-12-28 Thread Harihar Nahak
Yes, I had try that too. I took the pre-built spark 1.1 release. If you there are changes in up coming changes for GraphX library, just let me know or in spark 1.2 I can do try on that. --Harihar - --Harihar -- View this message in context:

Re: hello

2014-12-18 Thread Harihar Nahak
=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- Regards, Harihar Nahak BigData Developer Wynyard

Re: Spark GraphX question.

2014-12-18 Thread Harihar Nahak
-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- Regards, Harihar Nahak BigData Developer Wynyard Email:hna...@wynyardgroup.com | Extn: 8019

Re: RDDs being cleaned too fast

2014-12-16 Thread Harihar Nahak
-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- Regards, Harihar Nahak BigData Developer Wynyard Email:hna...@wynyardgroup.com | Extn: 8019 - --Harihar -- View this message

Re: NumberFormatException

2014-12-15 Thread Harihar Nahak
-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- Regards, Harihar Nahak BigData Developer Wynyard Email:hna...@wynyardgroup.com | Extn: 8019 - --Harihar -- View this message in context

Re: Multiple SparkContexts in same Driver JVM

2014-11-30 Thread Harihar Nahak
-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- Regards, Harihar Nahak BigData Developer Wynyard Email:hna...@wynyardgroup.com | Extn: 8019

Re: RDDs join problem: incorrect result

2014-11-30 Thread Harihar Nahak
=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- Regards, Harihar Nahak BigData Developer Wynyard Email:hna...@wynyardgroup.com | Extn: 8019 - --Harihar -- View this message in context: http://apache-spark-user-list.1001560.n3

Re: GraphX:java.lang.NoSuchMethodError:org.apache.spark.graphx.Graph$.apply

2014-11-30 Thread Harihar Nahak
-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- Regards, Harihar Nahak BigData Developer Wynyard Email:hna...@wynyardgroup.com | Extn: 8019

Re: Edge List File in GraphX

2014-11-30 Thread Harihar Nahak
-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- Regards, Harihar Nahak BigData Developer Wynyard Email:hna...@wynyardgroup.com | Extn: 8019

Re: Is Spark? or GraphX runs fast? a performance comparison on Page Rank

2014-11-27 Thread Harihar Nahak
-0800, Harihar Nahak [hidden email] http://user/SendEmail.jtp?type=nodenode=19956i=0 wrote: According to documentation GraphX runs 10x faster than normal Spark. So I run Page Rank algorithm in both the applications: [...] Local Mode (Machine : 8 Core; 16 GB memory; 2.80 Ghz Intel i7

Re: Lifecycle of RDD in spark-streaming

2014-11-27 Thread Harihar Nahak
-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- Regards, Harihar Nahak

Re: Is Spark? or GraphX runs fast? a performance comparison on Page Rank

2014-11-26 Thread Harihar Nahak
Hi Guys, is there any one experience the same thing as above? - --Harihar -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-or-GraphX-runs-fast-a-performance-comparison-on-Page-Rank-tp19710p19909.html Sent from the Apache Spark User List

Re: Configuring custom input format

2014-11-25 Thread Harihar Nahak
Hi, I'm trying to make custom input format for CSV file, if you can share little bit more what you read as input and what things you have implemented. I'll try to replicate the same things. If I find something interesting at my end I'll let you know. Thanks, Harihar - --Harihar --

Is Spark? or GraphX runs fast? a performance comparison on Page Rank

2014-11-24 Thread Harihar Nahak
Hi All, I started exploring Spark from past 2 months. I'm looking for some concrete features from both Spark and GraphX so that I'll take some decisions what to use, based upon who get highest performance. According to documentation GraphX runs 10x faster than normal Spark. So I run Page Rank

Re: Is there a way to turn on spark eventLog on the worker node?

2014-11-24 Thread Harihar Nahak
You can set the same parameter when launching an application, if you use sppar-submit tried --conf to give those variables or from SparkConfig also you can set the logs for both driver and workers. - --Harihar -- View this message in context:

Re: How to join two RDDs with mutually exclusive keys

2014-11-20 Thread Harihar Nahak
-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- Regards, Harihar Nahak BigData Developer Wynyard Email:hna...@wynyardgroup.com | Extn: 8019 - --Harihar -- View this message in context: http://apache-spark

Re: How to join two RDDs with mutually exclusive keys

2014-11-20 Thread Harihar Nahak
Thanks Daniel , Applied Join from PairedRDD val countByUsername = file1.join(file2) .map { case (id, (username, count)) = (id, username, count) } - --Harihar -- View this message in context:

Can we make EdgeRDD and VertexRDD storage level to MEMORY_AND_DISK?

2014-11-19 Thread Harihar Nahak
Hi, I'm running out of memory when I run a GraphX program for dataset moe than 10 GB, It was handle pretty well in case of noraml spark operation when did StorageLevel.MEMORY_AND_DISK. In case of GraphX I found its only allowed storing in memory, and it is because in Graph constructor, this

How to get list of edges between two Vertex ?

2014-11-19 Thread Harihar Nahak
Hi, I have a graph where no. of edges b/w two vertices are more than once possible. Now I need to find out who are top vertices between which no. of calls happen more? output should look like (V1, V2 , No. of edges) So I need to know, how to find out total no. of edges b/w only that two

Re: Can we make EdgeRDD and VertexRDD storage level to MEMORY_AND_DISK?

2014-11-19 Thread Harihar Nahak
Just figured it out using Graph constructor you can pass the storage level for both Edge and Vertex : Graph.fromEdges(edges, defaultValue = (,),StorageLevel.MEMORY_AND_DISK,StorageLevel.MEMORY_AND_DISK ) Thanks to this post : https://issues.apache.org/jira/browse/SPARK-1991 - --Harihar