Re: Tensor Flow

2016-12-12 Thread tog
Tensorframes is a project from databricks ( https://github.com/databricks/tensorframes). No commit for a couple of months though. Does anyone have an insight on the status of the project? On Mon, 12 Dec 2016 at 19:31 Meeraj Kunnumpurath wrote: > Apologies. okay, I

Apache Groovy and Spark

2015-11-18 Thread tog
Hi I start playing with both Apache projects and quickly got that exception. Anyone being able to give some hint on the problem so that I can dig further. It seems to be a problem for Spark to load some of the groovy classes ... Any idea? Thanks Guillaume tog GroovySpark $ $GROOVY_HOME/bin

Re: Apache Groovy and Spark

2015-11-18 Thread tog
maybe now with static compiles and the java7 > invoke-dynamic JARs things are better. I'm still unsure I'd use it in > production, and, given spark's focus on Scala and Python, I'd pick one of > those two > > > On 18 Nov 2015, at 20:35, tog <guillaume.all...@gmail.com> wrote

Re: converting categorical values in csv file to numerical values

2015-11-05 Thread tog
Hi Bala Can't you do a simple dictionnary and map those values to numbers? Cheers Guillaume On 5 November 2015 at 09:54, Balachandar R.A. wrote: > HI > > > I am new to spark MLlib and machine learning. I have a csv file that > consists of around 100 thousand rows and

Re: converting categorical values in csv file to numerical values

2015-11-05 Thread tog
ever, I read about HashingTF which exactly > does this quite efficiently and can scale too. Hence, looking for a > solution using this technique. > > > regards > Bala > > > On 5 November 2015 at 18:50, tog <guillaume.all...@gmail.com > <javascript:_e(%7B%7D,'cvml

Re: Question abt serialization

2015-07-28 Thread tog
it with just: (comment out line 27) println Count of spark: + file.filter({s - s.contains('spark')}). count() Thanks Best Regards On Sun, Jul 26, 2015 at 12:43 AM, tog guillaume.all...@gmail.com wrote: Hi I have been using Spark for quite some time using either scala or python. I wanted

Question abt serialization

2015-07-25 Thread tog
not doing correctly here. Thanks tog Groovy4Spark $ groovy GroovySparkWordcount.groovy class org.apache.spark.api.java.JavaRDD true true Caught: org.apache.spark.SparkException: Task not serializable org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner

sliding

2015-07-02 Thread tog
Hi Sorry for this scala/spark newbie question. I am creating RDD which represent large time series this way: val data = sc.textFile(somefile.csv) case class Event( time: Double, x: Double, vztot: Double ) val events = data.filter(s = !s.startsWith(GMT)).map{s =

Re: sliding

2015-07-02 Thread tog
the time serie. On 2 July 2015 at 18:25, Feynman Liang fli...@databricks.com wrote: What's the error you are getting? On Thu, Jul 2, 2015 at 9:37 AM, tog guillaume.all...@gmail.com wrote: Hi Sorry for this scala/spark newbie question. I am creating RDD which represent large time series

Re: sliding

2015-07-02 Thread tog
, 2015 at 2:33 PM, tog guillaume.all...@gmail.com wrote: Was complaining about the Seq ... Moved it to val eventsfiltered = events.sliding(3).map(s = Event(s(0).time, (s(0).x+s(1).x+s(2).x)/3.0 (s(0).vztot+s(1).vztot+s(2).vztot)/3.0)) and that is working. Anyway this is not what I wanted to do

Re: sliding

2015-07-02 Thread tog
, d, e), 2), ((d, e, f), 3)] After filter: [((a,b,c), 0), ((d, e, f), 3)], which is what I'm assuming you want (non-overlapping buckets)? You can then do something like .map(func(_._1)) to apply func (e.g. min, max, mean) to the 3-tuples. On Thu, Jul 2, 2015 at 3:20 PM, tog guillaume.all

Re: Time series data

2015-06-29 Thread tog
Hi Have you tested the Cloudera project: https://github.com/cloudera/spark-timeseries ? Let me know how did you progress on that route as I am also interested in that topic ? Cheers On 26 June 2015 at 14:07, Caio Cesar Trucolo truc...@gmail.com wrote: Hi everyone! I am working with

Re: spark and binary files

2015-05-12 Thread tog
. You may want to take a deeper look at SparkContext.newAPIHadoopRDD to load your data. On Sat, May 9, 2015 at 4:48 PM, tog guillaume.all...@gmail.com javascript:_e(%7B%7D,'cvml','guillaume.all...@gmail.com'); wrote: Hi I havé an application that currently run using MR. It currently starts

spark and binary files

2015-05-09 Thread tog
Hi I havé an application that currently run using MR. It currently starts extracting information from a proprietary binary file that is copied to HDFS. The application starts by creating business objects from information extracted from the binary files. Later those objects are used for further

parallelism on binary file

2015-05-08 Thread tog
Hi I havé an application that currently run using MR. It currently starts extracting information from a proprietary binary file that is copied to HDFS. The application starts by creating business objects from information extracted from the binary files. Later those objects are used for further