Hi,
I need to find the storage locations (node Ids ) of each partition of a
replicated rdd in spark. I mean, if an rdd is replicated twice, I want to
find the two nodes for each partition where it is stored.
Spark WebUI has a page wherein it depicts the data distribution of each
rdd. But, I
Thanks for providing that additional background, Josh.
It looks like many people on that Google Groups thread wanted a better
interface than is offered by the Apache mailing lists. Some even raised the
idea of a bi-directional bridge
This does not appear to be what the asker wanted as this makes one big
string. groupByKey is correct after parsing to key value pairs.
On Dec 26, 2014 3:55 AM, Somnath Pandeya somnath_pand...@infosys.com
wrote:
Hi ,
You can try reducebyKey also ,
Something like this
JavaPairRDDString,
I like the idea and the hope that it turns 2+ places for discussions into
1, but in practice I think it will just turn it into 3+. The only thing I
can imagine is making a tool like this an overlay. Does that require much
integration work and does it affect anyone who can't use it?
People won't
Hi,
Thank you very much to all for your reply.
I am able to get it by groupByKey
Here is my code :
import au.com.bytecode.opencsv.CSVParser
val data = sc.textFile(/data/data.csv);
def pLines(lines:Iterator[String])={
val parser=new CSVParser()
lines.map(l={val vs=parser.parseLine(l)
Hi, I m facing serious issues with spark application not recognizing the
classes in uber jar some times it recognizes some time its does not. even
adding external jars using setJars is not helping sometimes is any one else
facing similar issue? Im using the latest 1.2.0 version.
--
View this
I'm trying to make some operation with windows and intervals.
I get data every15 seconds, and want to have a windows of 60 seconds
with batch intervals of 15 seconds.
I''m injecting data with ncat. if I inject 3 logs in the same interval
I get into the do something each 15 secods during one
instead of setJars, you could try addJar and see if the issue still exists.
Thanks
Best Regards
On Fri, Dec 26, 2014 at 3:26 PM, critikaled isasmani@gmail.com wrote:
Hi, I m facing serious issues with spark application not recognizing the
classes in uber jar some times it recognizes some
this out put from std err will help?
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
14/12/26 10:13:44 INFO CoarseGrainedExecutorBackend: Registered signal
handlers for [TERM, HUP, INT]
14/12/26 10:13:44 WARN NativeCodeLoader: Unable to load native-hadoop
library
You cannot pass your jobConf object inside any of the transformation function
in spark (like map, mapPartitions, etc.) since
org.apache.hadoop.mapreduce.Job is not Serializable. You can use
KryoSerializer (See this doc
http://spark.apache.org/docs/latest/tuning.html#data-serialization), We
I'm trying to understand why it's not working and I typed some println
to check what the code was executing..
def ruleSqlInjection(lines: ReceiverInputDStream[String]) = {
println(1); //Just one time, when I start the program
val filterSql = lines.filter(line =
Oh, I didn't understand what I was doing, my fault (too much parties
these xmas). Thought windows works in another weird way. Sorry for the
questions..
2014-12-26 13:42 GMT+01:00 Guillermo Ortiz konstt2...@gmail.com:
I'm trying to understand why it's not working and I typed some println
to
Hi,
I need to find the storage locations (node Ids ) of each partition of a
replicated rdd in spark. I mean, if an rdd is replicated twice, I want to
find the two nodes for each partition where it is stored.
Spark WebUI has a page wherein it depicts the data distribution of each
rdd. But, I need
Thanks for the replies. Hopefully this will not be too difficult to fix.
Why not support multiple paths by overloading the parquetFile method to
take a collection of strings? That way we don't need an appropriate
delimiter.
On Thu, Dec 25, 2014 at 3:46 AM, Cheng, Hao hao.ch...@intel.com wrote:
Greetings!
I'm trying to do something similar, and having a very bad time of it.
What I start with is
key1: (col1, val-1-1, col2: val-1-2, col3: val-1-3, col4: val-1-4...)key2:
(col1: val-2-1, col2: val-2-2, col3: val-2-3, col4: val 2-4, ...)
What I want (what I have been asked to produce
Here is a sketch of what you need to do off the top of my head and based on
a guess of what your RDD is like:
val in: RDD[(K,Seq[(C,V)])] = ...
in.flatMap { case (key, colVals) =
colVals.map { case (col, val) =
(col, (key, val))
}
}.groupByKey
So the problem with both input and output
Did you receive any response on this? I am trying to load hbase classes
and getting the same error py4j.protocol.Py4JError: Trying to call a
package. . Even though the $HBASE_HOME/lib/* had already been added to
the compute-classpath.sh
2014-10-21 16:02 GMT-07:00 Mike Sukmanowsky
As of Spark 1.2 you can do Streaming k-means, see examples here:
http://spark.apache.org/docs/latest/mllib-clustering.html#examples-1
Best,
Reza
On Fri, Dec 26, 2014 at 1:36 AM, vishnu johnfedrickena...@gmail.com wrote:
Hi,
Say I have created a clustering model using KMeans for 100million
In case jdk 1.7 or higher is used to build, --skip-java-test needs to be
specifed.
FYI
On Thu, Dec 25, 2014 at 5:03 PM, guxiaobo1982 guxiaobo1...@qq.com wrote:
The following command works
./make-distribution.sh --tgz -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
-Dhadoop.version=2.6.0 -Phive
Hi,I build the 1.2.0 version of spark against single node hadoop 2.6.0
installed by ambari 1.7.0, the ./bin/run-example SparkPi 10 command can execute
on my local Mac 10.9.5 and the centos virtual machine, which host hadoop, but I
can't run the SparkPi example inside yarn, it seems there's
Hello , I am zigen.
I am using the Spark SQL 1.1.0.
I want to use the Spark SQL 1.2.0.
but my Spark application is a compile error.
Spark 1.1.0 had a DataType.DecimalType.
but Spark1.2.0 had not DataType.DecimalType.
Why ?
JavaDoc (Spark 1.1.0)
21 matches
Mail list logo