Re: [ANN] Sparkling, a Clojure-API to Apache Spark.

2016-01-17 Thread Drew R
Hi,
Is it possible to run lein repl against standalone spark cluster?
I launched cluster on localhost with master at spark://zhmyh-osx.local:7077 
and tried to run following commands: 

(require '[sparkling.conf :as conf])

(require '[sparkling.core :as spark])

(def c (-> (conf/spark-conf)
   (conf/master "spark://zhmyh-osx.local:7077")
   (conf/app-name "sparkling-example")))

(def sc (spark/spark-context c))

(def data (spark/parallelize sc ["a" "b" "c" "d" "e"]))

(spark/first data)

and got this
tf-idf.core=> (spark/first data)
2016-01-17 10:42:22,977  INFO spark.SparkContext:59 - Starting job: first 
at NativeMethodAccessorImpl.java:-2
2016-01-17 10:42:23,004  INFO cluster.SparkDeploySchedulerBackend:59 - 
Registered executor: 
AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@192.168.200.217:54884/user/Executor#797341151])
 
with ID 0
2016-01-17 10:42:23,014  INFO scheduler.DAGScheduler:59 - Got job 0 (first 
at NativeMethodAccessorImpl.java:-2) with 1 output partitions
2016-01-17 10:42:23,015  INFO scheduler.DAGScheduler:59 - Final stage: 
ResultStage 0(first at NativeMethodAccessorImpl.java:-2)
2016-01-17 10:42:23,016  INFO scheduler.DAGScheduler:59 - Parents of final 
stage: List()
2016-01-17 10:42:23,017  INFO scheduler.DAGScheduler:59 - Missing parents: 
List()
2016-01-17 10:42:23,031  INFO scheduler.DAGScheduler:59 - Submitting 
ResultStage 0 (# 
ParallelCollectionRDD[0] at parallelize at 
NativeMethodAccessorImpl.java:-2), which has no missing parents
2016-01-17 10:42:23,247  INFO cluster.SparkDeploySchedulerBackend:59 - 
Registered executor: 
AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@192.168.200.217:54886/user/Executor#-391700515])
 
with ID 1
2016-01-17 10:42:23,265  INFO storage.MemoryStore:59 - 
ensureFreeSpace(1512) called with curMem=0, maxMem=1030823608
2016-01-17 10:42:23,268  INFO storage.MemoryStore:59 - Block broadcast_0 
stored as values in memory (estimated size 1512.0 B, free 983.1 MB)
2016-01-17 10:42:23,349  INFO storage.BlockManagerMasterEndpoint:59 - 
Registering block manager 192.168.200.217:54890 with 530.0 MB RAM, 
BlockManagerId(0, 192.168.200.217, 54890)
2016-01-17 10:42:23,482  INFO storage.BlockManagerMasterEndpoint:59 - 
Registering block manager 192.168.200.217:54891 with 530.0 MB RAM, 
BlockManagerId(1, 192.168.200.217, 54891)
2016-01-17 10:42:23,639  INFO storage.MemoryStore:59 - ensureFreeSpace(976) 
called with curMem=1512, maxMem=1030823608
2016-01-17 10:42:23,640  INFO storage.MemoryStore:59 - Block 
broadcast_0_piece0 stored as bytes in memory (estimated size 976.0 B, free 
983.1 MB)
2016-01-17 10:42:23,643  INFO storage.BlockManagerInfo:59 - Added 
broadcast_0_piece0 in memory on 127.0.0.1:54879 (size: 976.0 B, free: 983.1 
MB)
2016-01-17 10:42:23,646  INFO spark.SparkContext:59 - Created broadcast 0 
from broadcast at DAGScheduler.scala:861
2016-01-17 10:42:23,652  INFO scheduler.DAGScheduler:59 - Submitting 1 
missing tasks from ResultStage 0 (# ParallelCollectionRDD[0] at parallelize at 
NativeMethodAccessorImpl.java:-2)
2016-01-17 10:42:23,654  INFO scheduler.TaskSchedulerImpl:59 - Adding task 
set 0.0 with 1 tasks
2016-01-17 10:42:23,720  INFO scheduler.TaskSetManager:59 - Starting task 
0.0 in stage 0.0 (TID 0, 192.168.200.217, PROCESS_LOCAL, 1946 bytes)
2016-01-17 10:42:23,928  WARN scheduler.TaskSetManager:71 - Lost task 0.0 
in stage 0.0 (TID 0, 192.168.200.217): java.lang.IllegalStateException: 
unread block data
at 
java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1383)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

2016-01-17 10:42:23,942  INFO scheduler.TaskSetManager:59 - Starting task 
0.1 in stage 0.0 (TID 1, 192.168.200.217, PROCESS_LOCAL, 1946 bytes)
2016-01-17 10:42:24,145  INFO scheduler.TaskSetManager:59 - Lost task 0.1 
in stage 0.0 (TID 1) on executor 192.168.200.217: 
java.lang.IllegalStateException (unread block data) [duplicate 1]
2016-01-17 10:42:24,155  INFO scheduler.TaskSetManager:59 - Starting task 
0.2 in stage 0.0 (TID 2, 192.168.200.217, PROCESS_LOCAL, 1946 bytes)
2016-01-17 10:42:24,168  INFO scheduler.TaskSetManager:59 - L

Re: [ANN] Sparkling, a Clojure-API to Apache Spark.

2015-05-18 Thread Tilak Thapa
Oh yes. Thanks Dmitriy.

On Monday, May 18, 2015 at 4:13:09 AM UTC-5, Dmitriy Morozov wrote:
>
>
> Hi Tilak!
>
> You are trying to call groupByKey on instance of JavaRDD which doesn't 
> have this method. You need JavaPairRDD which you can create with 
> spark/map-to-pair.
>
> Good luck!
>
> On Wednesday, January 7, 2015 at 2:23:42 PM UTC+2, chris_betz wrote:
>>
>> Hi,
>>
>>
>> we just released Sparkling (https://gorillalabs.github.io/sparkling/), 
>> our take on an API to Apache Spark.
>>
>>
>>
>> With an eye on speed for very large amounts of data we improved clj-spark 
>> and flambo to get us the speed we need for our production environment.
>>
>>
>> See https://gorillalabs.github.io/sparkling/articles/getting_started.html 
>> for a quickstart or dive directly into our playground project by 
>> git clone 
>> https://github.com/gorillalabs/sparkling-getting-started.git
>>
>>
>>
>> Happy hacking
>>
>> Chris
>> (@gorillalabs_de )
>>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [ANN] Sparkling, a Clojure-API to Apache Spark.

2015-05-18 Thread Dmitriy Morozov

Hi Tilak!

You are trying to call groupByKey on instance of JavaRDD which doesn't have 
this method. You need JavaPairRDD which you can create with 
spark/map-to-pair.

Good luck!

On Wednesday, January 7, 2015 at 2:23:42 PM UTC+2, chris_betz wrote:
>
> Hi,
>
>
> we just released Sparkling (https://gorillalabs.github.io/sparkling/), 
> our take on an API to Apache Spark.
>
>
>
> With an eye on speed for very large amounts of data we improved clj-spark 
> and flambo to get us the speed we need for our production environment.
>
>
> See https://gorillalabs.github.io/sparkling/articles/getting_started.html 
> for a quickstart or dive directly into our playground project by 
> git clone https://github.com/gorillalabs/sparkling-getting-started.git
>
>
>
> Happy hacking
>
> Chris
> (@gorillalabs_de )
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [ANN] Sparkling, a Clojure-API to Apache Spark.

2015-05-17 Thread Tilak Thapa
Hi,
could anybody please help me to figure out following error with 
group-by-key fn?

(defn sorted-tuple [p f]
  (if (< (.compareTo p f) 0)
(spark/tuple p f)
(spark/tuple f p)))

(defn tuples-list [[p & frens]]
  (map #(spark/tuple (sorted-tuple p %) frens) frens))


(->> (spark/parallelize sc ["100 200 300 400 500 600", "200 100 300 400", "300 
100 200 400 500", "400 100 200 300", "500 100 300", "600 100"])
 (spark/map #(split-at-space %))
 (spark/flat-map #(tuples-list %))
 (spark/group-by-key))


CompilerException java.lang.IllegalArgumentException: No matching field found: 
groupByKey for class org.apache.spark.api.java.JavaRDD, compiling:


I'm trying hard to figure out but could not. It works if I replace 
'group-by-key' with 

(spark/group-by #(._1 %))

but that's not what I want.


Anybody please?



thanks

-tilak



On Wednesday, January 7, 2015 at 6:23:42 AM UTC-6, chris_betz wrote:
>
> Hi,
>
>
> we just released Sparkling (https://gorillalabs.github.io/sparkling/), 
> our take on an API to Apache Spark.
>
>
>
> With an eye on speed for very large amounts of data we improved clj-spark 
> and flambo to get us the speed we need for our production environment.
>
>
> See https://gorillalabs.github.io/sparkling/articles/getting_started.html 
> for a quickstart or dive directly into our playground project by 
> git clone https://github.com/gorillalabs/sparkling-getting-started.git
>
>
>
> Happy hacking
>
> Chris
> (@gorillalabs_de )
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.