column of label
> indices. The indices are in [0, numLabels), ordered by label frequencies."
>
> Xinh
>
> On Tue, Jun 28, 2016 at 12:29 AM, Jaonary Rabarisoa <jaon...@gmail.com>
> wrote:
>
>> Dear all,
>>
>> I'm trying to a find a way to transform a D
Dear all,
I'm trying to a find a way to transform a DataFrame into a data that is
more suitable for third party classification algorithm. The DataFrame have
two columns : "feature" represented by a vector and "label" represented by
a string. I want the "label" to be a number between [0, number of
Hi all,
Is it possible to learn a gaussian mixture model with a diagonal covariance
matrix in the GMM algorithm implemented in MLIb ? It seems to be possible
but can't figure out how to do that.
Cheers,
Jao
Hi there,
The Pipeline of ml package is really a great feature and we use it in our
every day task. But we have some use case where we need a Pipeline of
Transformers only and the problem is that there's not train phase in that
case. For example, we have a pipeline of image analytics with the
Hi there,
The actual API of ml.Transformer use only DataFrame as input. I have a use
case where I need to transform a single element. For example transforming
an element from spark-streaming. Is there any reason for this or the
ml.Transformer will support transforming a single element later ?
(locally sensitive hashing).
A quick search gave this link to a Spark implementation:
http://stackoverflow.com/questions/2771/spark-implementation-for-locality-sensitive-hashing
On Wed, Aug 26, 2015 at 7:35 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Dear all,
I'm trying to find
Dear all,
I'm trying to find an efficient way to build a k-NN graph for a large
dataset. Precisely, I have a large set of high dimensional vector (say d
1) and I want to build a graph where those high dimensional points
are the vertices and each one is linked to the k-nearest neighbor based
In this example, every thing work expect save to parquet file.
On Mon, May 11, 2015 at 4:39 PM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
MyDenseVectorUDT do exist in the assembly jar and in this example all the
code is in a single file to make sure every thing is included.
On Tue, Apr 21
(or in the assembly jar) at runtime. Make sure the
full class name (with package name) is used. Btw, UDTs are not public
yet, so please use it with caution. -Xiangrui
On Fri, Apr 17, 2015 at 12:45 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Dear all,
Here is an example of code
Dear all,
Here is an issue that gets me mad. I wrote a UserDefineType in order to be
able to store a custom type in a parquet file. In my code I just create a
DataFrame with my custom data type and write in into a parquet file. When I
run my code directly inside idea every thing works like a
Any ideas ?
On Thu, Apr 16, 2015 at 5:04 PM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Dear all,
Here is an issue that gets me mad. I wrote a UserDefineType in order to be
able to store a custom type in a parquet file. In my code I just create a
DataFrame with my custom data type and write
assembly jar?
On Thu, Apr 16, 2015 at 4:46 PM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Any ideas ?
On Thu, Apr 16, 2015 at 5:04 PM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Dear all,
Here is an issue that gets me mad. I wrote a UserDefineType in order to
be able to store a custom type
Hi all,
If you follow the example of schema merging in the spark documentation
http://spark.apache.org/docs/latest/sql-programming-guide.html#schema-merging
you obtain the following results when you want to load the result data :
single triple double
1 3 null
2 6 null
4
I forgot to mention that the imageId field is a custom scala object. Do I
need to implement some special method to make it works (equal, hashCode ) ?
On Tue, Apr 14, 2015 at 5:00 PM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Dear all,
In the latest version of spark there's a feature called
Dear all,
In the latest version of spark there's a feature called : automatic
partition discovery and Schema migration for parquet. As far as I know,
this gives the ability to split the DataFrame into several parquet files,
and by just loading the parent directory one can get the global schema of
, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Hmm, I got the same error with the master. Here is another test example
that
fails. Here, I explicitly create
a Row RDD which corresponds to the use case I am in :
object TestDataFrame {
def main(args: Array[String]): Unit
, Mar 31, 2015 at 11:18 PM, Xiangrui Meng men...@gmail.com wrote:
I cannot reproduce this error on master, but I'm not aware of any
recent bug fixes that are related. Could you build and try the current
master? -Xiangrui
On Tue, Mar 31, 2015 at 4:10 AM, Jaonary Rabarisoa jaon...@gmail.com
to RDD.
Thanks
Shivaram
On Mon, Mar 30, 2015 at 8:37 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Dear all,
I'm still struggling to make a pre-trained caffe model transformer for
dataframe works. The main problem is that creating a caffe model inside the
UDF is very slow and consumes
, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Following your suggestion, I end up with the following implementation :
*override def transform(dataSet: DataFrame, paramMap: ParamMap): DataFrame =
{ val schema = transformSchema(dataSet.schema, paramMap, logging = true)
val map
Hi all,
DataFrame with an user defined type (here mllib.Vector) created with
sqlContex.createDataFrame can't be saved to parquet file and raise
ClassCastException:
org.apache.spark.mllib.linalg.DenseVector cannot be cast to
org.apache.spark.sql.Row error.
Here is an example of code to reproduce
. There is not a
great way to do something equivalent to mapPartitions with UDFs right now.
On Tue, Mar 3, 2015 at 4:36 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Here is my current implementation with current master version of spark
*class DeepCNNFeature extends Transformer with HasInputCol
, 2015 at 11:36 PM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
On Thu, Mar 12, 2015 at 3:05 PM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
In fact, by activating netlib with native libraries it goes faster.
Glad you got it work ! Better performance was one of the reasons we
Thanks
Shivaram
On Tue, Mar 10, 2015 at 9:57 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
I'm trying to play with the implementation of least square solver (Ax =
b) in mlmatrix.TSQR where A is a 5*1024 matrix and b a 5*10
matrix. It works but I notice
that it's 8 times slower
://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala
On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Dear all,
Is there a least square solver based on DistributedMatrix that we can use
out of the box
/mlmatrix/NormalEquations.scala
On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Dear all,
Is there a least square solver based on DistributedMatrix that we can use
out of the box in the current (or the master) version of spark ?
It seems that the only least square
Hi Cesar,
Yes, you can define an UDT with the new DataFrame, the same way that
SchemaRDD did.
Jaonary
On Fri, Mar 6, 2015 at 4:22 PM, Cesar Flores ces...@gmail.com wrote:
The SchemaRDD supports the storage of user defined classes. However, in
order to do that, the user class needs to
/NormalEquations.scala
On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Dear all,
Is there a least square solver based on DistributedMatrix that we can use
out of the box in the current (or the master) version of spark ?
It seems that the only least square solver available in spark
between Spark 1.2 and 1.3. In 1.3, the DSL is much
improved and makes it easier to create a new column.
Joseph
On Sun, Mar 1, 2015 at 1:26 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
class DeepCNNFeature extends Transformer ... {
override def transform(data: DataFrame, paramMap
Dear all,
Is there a least square solver based on DistributedMatrix that we can use
out of the box in the current (or the master) version of spark ?
It seems that the only least square solver available in spark is private to
recommender package.
Cheers,
Jao
class DeepCNNFeature extends Transformer ... {
override def transform(data: DataFrame, paramMap: ParamMap): DataFrame
= {
// How can I do a map partition on the underlying RDD and
then add the column ?
}
}
On Sun, Mar 1, 2015 at 10:23 AM, Jaonary Rabarisoa jaon
:
myRDD.mapPartitions { myDataOnPartition =
val myModel = // instantiate neural network on this partition
myDataOnPartition.map { myDatum = myModel.predict(myDatum) }
}
I hope this helps!
Joseph
On Fri, Feb 27, 2015 at 10:27 PM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Dear all
Dear all,
We mainly do large scale computer vision task (image classification,
retrieval, ...). The pipeline is really great stuff for that. We're trying
to reproduce the tutorial given on that topic during the latest spark
summit (
should be
`df.select($image.data.as(features))`.
On Tue, Feb 24, 2015 at 3:35 PM, Xiangrui Meng men...@gmail.com wrote:
If you make `Image` a case class, then select(image.data) should work.
On Tue, Feb 24, 2015 at 3:06 PM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Hi all,
I have
Hi all,
I have a DataFrame that contains a user defined type. The type is an image
with the following attribute
*class Image(w: Int, h: Int, data: Vector)*
In my DataFrame, images are stored in column named image that corresponds
to the following case class
*case class LabeledImage(label: Int,
issues you're running into?
Joseph
On Mon, Jan 19, 2015 at 12:59 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Hi all,
I'm trying to implement a pipeline for computer vision based on the
latest ML package in spark. The first step of my pipeline is to decode
image (jpeg for instance) stored
Hi all,
I'm trying to run the master version of spark in order to test some alpha
components in ml package.
I follow the build spark documentation and build it with :
$ mvn clean package
The build is successful but when I try to run spark-shell I got the
following errror :
*Exception in
That's what I did.
On Mon, Feb 2, 2015 at 11:28 PM, Sean Owen so...@cloudera.com wrote:
Snapshot builds are not published. Unless you build and install snapshots
locally (like with mvn install) they wont be found.
On Feb 2, 2015 10:58 AM, Jaonary Rabarisoa jaon...@gmail.com wrote:
Hi all
Hi all,
I'm trying to use the master version of spark. I build and install it with
$ mvn clean clean install
I manage to use it with the following configuration in my build.sbt :
*libraryDependencies ++= Seq( org.apache.spark %% spark-core %
1.3.0-SNAPSHOT % provided, org.apache.spark %%
Hi all,
I'm trying to implement a pipeline for computer vision based on the latest
ML package in spark. The first step of my pipeline is to decode image (jpeg
for instance) stored in a parquet file.
For this, I begin to create a UserDefinedType that represents a decoded
image stored in a array of
' benchmark implementation will be released
soon
On 9 January 2015 at 10:59, Marco Shaw marco.s...@gmail.com wrote:
Pretty vague on details:
http://www.datasciencecentral.com/m/blogpost?id=6448529%3ABlogPost%3A227199
On Jan 9, 2015, at 11:39 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Hi
Hi all,
DeepLearning algorithms are popular and achieve many state of the art
performance in several real world machine learning problems. Currently
there are no DL implementation in spark and I wonder if there is an ongoing
work on this topics.
We can do DL in spark Sparkling water and H2O but
Hi,
There's is a ongoing work on model export
https://www.github.com/apache/spark/pull/3062
For now, since LinearRegression is serializable you can save it as object
file :
sc.saveAsObjectFile(Seq(model))
then
val model = sc.objectFile[LinearRegresionWithSGD](path).first
model.predict(...)
On Mon, Dec 8, 2014 at 7:53 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
After some investigation, I learned that I can't compare kmeans in mllib
with another kmeans implementation directly. The kmeans|| initialization
step takes more time than the algorithm implemented in julia
Dear all,
I'm trying to understand what is the correct use case of ColumnSimilarity
implemented in RowMatrix.
As far as I know, this function computes the similarity of a column of a
given matrix. The DIMSUM paper says that it's efficient for large m (rows)
and small n (columns). In this case
+1 with 1.3-SNAPSHOT.
On Mon, Dec 1, 2014 at 5:49 PM, agg212 alexander_galaka...@brown.edu
wrote:
Thanks for your reply, but I'm still running into issues
installing/configuring the native libraries for MLlib. Here are the steps
I've taken, please let me know if anything is incorrect.
-
Hi all,
I'm trying to run some spark job with spark-shell. What I want to do is
just to count the number of lines in a file.
I start the spark-shell with the default argument i.e just with
./bin/spark-shell.
Load the text file with sc.textFile(path) and then call count on my data.
When I do
that the Hadoop
InputFormat would make 52 splits for it. Data drives partitions, not
processing resource. Really, 8 splits is the minimum parallelism you
want. Several times your # of cores is better.
On Fri, Dec 5, 2014 at 8:51 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Hi all,
I'm trying
Hi all,
I'm trying to a run clustering with kmeans algorithm. The size of my data
set is about 240k vectors of dimension 384.
Solving the problem with the kmeans available in julia (kmean++)
http://clusteringjl.readthedocs.org/en/latest/kmeans.html
take about 8 minutes on a single core.
PM, Davies Liu dav...@databricks.com wrote:
Could you post you script to reproduce the results (also how to
generate the dataset)? That will help us to investigate it.
On Fri, Dec 5, 2014 at 8:40 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Hmm, here I use spark on local mode on my laptop
Dear all,
How can one save a kmeans model after training ?
Best,
Jao
Hi all,
I have a spark job that I build with sbt and I can run without any problem
with sbt run. But when I run it inside IntelliJ Idea I got the following
error :
*Exception encountered when invoking run on a nested suite - class
javax.servlet.FilterRegistration's signer information does not
to me. Have a try.
Good luck,
Niklas
On 23.10.2014 21:52, Jaonary Rabarisoa wrote:
Hi all,
I have the following case class that I want to use as a key in a
key-value rdd. I defined the equals and hashCode methode but it's not
working. What I'm doing wrong ?
*case class PersonID(id
Dear all,
Is it possible to use any kind of object as key in a PairedRDD. When I use
a case class key, the groupByKey operation don't behave as I expected. I
want to use a case class to avoid using a large tuple as it is easier to
manipulate.
Cheers,
Jaonary
,
Sonal
Nube Technologies http://www.nubetech.co
http://in.linkedin.com/in/sonalgoyal
On Fri, Oct 17, 2014 at 12:28 PM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Dear all,
Is it possible to use any kind of object as key in a PairedRDD. When I
use a case class key, the groupByKey operation
Hi all,
I need to compute a similiarity between elements of two large sets of high
dimensional feature vector.
Naively, I create all possible pair of vectors with
* features1.cartesian(features2)* and then map the produced paired rdd with
my similarity function.
The problem is that the cartesian
.
This implements the DIMSUM sampling scheme, recently merged into master
https://github.com/apache/spark/pull/1778.
Best,
Reza
On Fri, Oct 17, 2014 at 3:43 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Hi all,
I need to compute a similiarity between elements of two large sets of
high
And what about Hue http://gethue.com ?
On Sun, Oct 12, 2014 at 1:26 PM, andy petrella andy.petre...@gmail.com
wrote:
Dear Sparkers,
As promised, I've just updated the repo with a new name (for the sake of
clarity), default branch but specially with a dedicated README containing:
*
Dear all,
I have a spark job with the following configuration
*val conf = new SparkConf()*
* .setAppName(My Job)*
* .set(spark.serializer,
org.apache.spark.serializer.KryoSerializer)*
* .set(spark.kryo.registrator, value.serializer.Registrator)*
* .setMaster(local[4])*
*
in fact with --driver-memory 2G I can get it working
On Thu, Oct 9, 2014 at 6:20 PM, Xiangrui Meng men...@gmail.com wrote:
Please use --driver-memory 2g instead of --conf
spark.driver.memory=2g. I'm not sure whether this is a bug. -Xiangrui
On Thu, Oct 9, 2014 at 9:00 AM, Jaonary Rabarisoa
Hi all,
I'm using some functions from Breeze in a spark job but I get the following
build error :
*Error:scalac: bad symbolic reference. A signature in RandBasis.class
refers to term math3*
*in package org.apache.commons which is not available.*
*It may be completely missing from the current
version3.3/version
scopetest/scope
/dependency
Adjusting the scope should solve the problem below.
On Fri, Sep 26, 2014 at 8:42 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Hi all,
I'm using some functions from Breeze in a spark job but I get the
following build error
-mechanism.html#Dependency_Scope
Cheers
On Fri, Sep 26, 2014 at 8:57 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Thank Ted. Can you tell me how to adjust the scope ?
On Fri, Sep 26, 2014 at 5:47 PM, Ted Yu yuzhih...@gmail.com wrote:
spark-core's dependency on commons-math3 is @ test scope (core
Hi all,
I'm trying to process a large image data set and need some way to optimize
my implementation since it's very slow from now. In my current
implementation I store my images in an object file with the following fields
case class Image(groupId: String, imageId: String, buffer: String)
Dear all,
I'm facing the following problem and I can't figure how to solve it.
I need to join 2 rdd in order to find their intersections. The first RDD
represent an image encoded in base64 string associated with image id. The
second RDD represent a set of geometric primitives (rectangle)
.
personTable.where('name in (foo, bar))
On Thu, Aug 28, 2014 at 3:09 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Hi all,
What is the expression that I should use with spark sql DSL if I need to
retreive
data with a field in a given set.
For example :
I have the following schema
case
[Expression](a, b, ...)
table(src).where('key in (longList: _*))
Also, note that I had to explicitly specify Expression as the type
parameter of Seq to ensure that the compiler converts a and b into
Spark SQL expressions.
On Thu, Aug 28, 2014 at 11:52 PM, Jaonary Rabarisoa jaon...@gmail.com
1.0.2
On Friday, August 29, 2014, Michael Armbrust mich...@databricks.com wrote:
What version are you using?
On Fri, Aug 29, 2014 at 2:22 AM, Jaonary Rabarisoa jaon...@gmail.com
javascript:_e(%7B%7D,'cvml','jaon...@gmail.com'); wrote:
Still not working for me. I got a compilation error
Hi all,
What is the expression that I should use with spark sql DSL if I need to
retreive
data with a field in a given set.
For example :
I have the following schema
case class Person(name: String, age: Int)
And I need to do something like :
personTable.where('name in Seq(foo, bar)) ?
Dear all,
I'm looking for an efficient way to manage external dependencies. I know
that one can add .jar or .py dependencies easily but how can I handle other
type of dependencies. Specifically, I have some data processing algorithm
implemented with other languages (ruby, octave, matlab, c++) and
. Just watch out for any environment variables
needed (you can pass them to pipe() as an optional argument if there are
some).
On August 25, 2014 at 12:41:29 AM, Jaonary Rabarisoa (jaon...@gmail.com)
wrote:
Hi all,
Is there someone that tried to pipe RDD into matlab script ? I'm trying to
do
Hi all,
Is there someone that tried to pipe RDD into matlab script ? I'm trying to
do something similiar if one of you could point some hints.
Best regards,
Jao
Dear all,
Is there any example of mapPartitions that fork external process or how to
make RDD.pipe working on every data of a partition ?
Cheers,
Jaonary
query? Did you use the Hive Parser (your query was
submitted through hql(...)) or the basic SQL Parser (your query was
submitted through sql(...)).
Thanks,
Yin
On Tue, Jul 15, 2014 at 8:52 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Hi all,
When running a join operation with Spark
Hi all,
I need to run a spark job that need a set of images as input. I need
something that load these images as RDD but I just don't know how to do
that. Do some of you have any idea ?
Cheers,
Jao
image as a single object in 1 rdd
would perhaps not be super optimized.
Regards
Mayur
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Wed, Jul 9, 2014 at 12:17 PM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Hi all
Hi all,
I need to run a complex external process with a lots of dependencies from
spark. The pipe and addFile function seem to be my friends but there
are just some issues that I need to solve.
Precisely, the process I want to run are C++ executable that may depend on
some libraries and
, that
the front end can talk to
On Tue, Jun 24, 2014 at 3:12 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Hi all,
So far, I run my spark jobs with spark-shell or spark-submit command. I'd
like to go further and I wonder how to use spark as a backend of a web
application. Specificaly, I want a frontend
Hi all,
I'm trying to use spark sql to store data in parquet file. I create the
file and insert data into it with the following code :
*val conf = new SparkConf().setAppName(MCT).setMaster(local[2])
val sc = new SparkContext(conf)val sqlContext = new SQLContext(sc)
Is there an equivalent of wholeTextFiles for binary files for example a set
of images ?
Cheers,
Jaonary
Hi all,
So far, I run my spark jobs with spark-shell or spark-submit command. I'd
like to go further and I wonder how to use spark as a backend of a web
application. Specificaly, I want a frontend application ( build with nodejs
) to communicate with spark on the backend, so that every query
Hi all,
I'm just wondering if hybrid GPU/CPU computation is something that is
feasible with spark ? And what should be the best way to do it.
Cheers,
Jaonary
machines.
Sent from my iPhone
On Apr 11, 2014, at 8:38 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Hi all,
I'm just wondering if hybrid GPU/CPU computation is something that is
feasible with spark ? And what should be the best way to do it.
Cheers,
Jaonary
On Mar 28, 2014, at 3:09 AM, Jaonary Rabarisoa jaon...@gmail.com wrote:
I forgot to mention that I don't really use all of my data. Instead I use
a sample extracted with randomSample.
On Fri, Mar 28, 2014 at 10:58 AM, Jaonary Rabarisoa jaon...@gmail.comwrote:
Hi all,
I notice
Hi all;
Can someone give me some tips to compute mean of RDD by key , maybe with
combineByKey and StatCount.
Cheers,
Jaonary
I forgot to mention that I don't really use all of my data. Instead I use a
sample extracted with randomSample.
On Fri, Mar 28, 2014 at 10:58 AM, Jaonary Rabarisoa jaon...@gmail.comwrote:
Hi all,
I notice that RDD.cartesian has a strange behavior with cached and
uncached data. More
fully? I have seen this (in my limited
experience) pop up as a result of previous exceptions/errors, also as a
result of being unable to serialize objects etc.
Ognen
On 3/26/14, 10:39 AM, Jaonary Rabarisoa wrote:
I notice that I get this error when I'm trying to load an objectFile
Dear all,
Sorry for asking such a basic question, but someone can explain when one
should use mapPartiontions instead of map.
Thanks
Jaonary
Dear all,
As a Spark newbie, I need some help to understand how RDD save to file
behaves. After reading the post on saving single files efficiently
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-as-a-single-file-efficiently-td3014.html
I understand that each partition of the
Dear all,
I need to run a series of transformations that map a RDD into another RDD.
The computation changes over times and so does the resulting RDD. Each
results is then saved to the disk in order to do further analysis (for
example variation of the result over time).
The question is, if I
Hi
I need to partition my data represented as RDD into n folds and run metrics
computation in each fold and finally compute the means of my metrics
overall the folds.
Does spark can do the data partition out of the box or do I need to
implement it myself. I know that RDD has a partitions method
, then collect the output. This might be useful if, e.g., your
external process doesn't use line-oriented input/output.
-Ewen
Jaonary Rabarisoa jaon...@gmail.com
March 20, 2014 at 1:04 AM
Dear all,
Dear all,
Does Spark has a kind of Hadoop streaming feature to run external process
Hi all,
I'm trying to build an evaluation platform based on Spark. The idea is to
run a blackbox executable (build with c/c++ or some scripting language).
This blackbox takes a set of data as input and outpout some metrics. Since
I have a huge amount of data, I need to distribute the computation
Dear All,
I'm trying to cluster data from native library code with Spark Kmeans||. In
my native library the data are represented as a matrix (row = number of
data and col = dimension). For efficiency reason, they are copied into a
one dimensional scala Array row major wise so after the
Hi all,
I'm trying to read a sequenceFile that represent a set of jpeg image
generated using this tool :
http://stuartsierra.com/2008/04/24/a-million-little-files . According to
the documentation : Each key is the name of a file (a Hadoop “Text”), the
value is the binary contents of the file (a
94 matches
Mail list logo