Hi,
org.apache.spark.mllib.linalg.Vector =
(1048576,[35587,884670],[3.458767233,3.458767233])
it is sparse vector representation of terms
so the first term(1048576) is the length of vector
[35587,884670] is the index of words
[3.458767233,3.458767233] are the tf-idf values of the terms.
Thanks
Try scala eclipse plugin to eclipsify spark project and import spark as eclipse
project
-Somnath
-Original Message-
From: Nan Xiao [mailto:xiaonan830...@gmail.com]
Sent: Thursday, May 28, 2015 12:32 PM
To: user@spark.apache.org
Subject: How to use Eclipse on Windows to build Spark
Hi Akhil,
I am running my program standalone, I am getting null pointer exception when I
running spark program locally and when I am trying to save my RDD as a text
file.
From: Akhil Das [mailto:ak...@sigmoidanalytics.com]
Sent: Tuesday, April 14, 2015 12:41 PM
To: Somnath Pandeya
Cc: user
JavaRDDString lineswithoutStopWords = nonEmptylines
.map(new FunctionString, String() {
/**
*
*/
private static final long
Hi All,
I want to find near duplicate items from given dataset
For e.g consider a data set
1. Cricket,bat,ball,stumps
2. Cricket,bowler,ball,stumps,
3. Football,goalie,midfielder,goal
4. Football,refree,midfielder,goal,
Here 1 and 2 are near duplicates (only field 2 is
Thanks Akhil , it was a simple fix which you told .. I missed it .. ☺
From: Akhil Das [mailto:ak...@sigmoidanalytics.com]
Sent: Wednesday, February 25, 2015 12:48 PM
To: Somnath Pandeya
Cc: user@spark.apache.org
Subject: Re: used cores are less then total no. of core
You can set the following
Hi All,
I am running a simple word count example of spark (standalone cluster) , In the
UI it is showing
For each worker no. of cores available are 32 ,but while running the jobs only
5 cores are being used,
What should I do to increase no. of used core or it is selected based on jobs.
Thanks
May be you can use wholeTextFiles method, which returns filename and content of
the file as PariRDD and ,then you can remove the first line from files.
-Original Message-
From: Hafiz Mujadid [mailto:hafizmujadi...@gmail.com]
Sent: Friday, January 09, 2015 11:48 AM
To:
You can follow the below the link also. It works on stand alone spark cluster.
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
thanks
Somnath
From: Michael Armbrust [mailto:mich...@databricks.com]
Sent: Thursday, January 08, 2015 2:21 AM
To: jamborta
Cc: user
Hi,
I have setup the spark 1.2 standalone cluster and trying to run hive on spark
by following below link.
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
I got the latest build of hive on spark from git and was trying to running few
queries. Queries are
Hi ,
You can try reducebyKey also ,
Something like this
JavaPairRDDString, String ones = lines
.mapToPair(new PairFunctionString, String,
String() {
@Override
public Tuple2String, String call(String s)
11 matches
Mail list logo