In my problem I have a number of intermediate JavaRDDs and would like to
be able to look at their sizes without destroying the RDD for sibsequent
processing. persist will do this but these are big and perisist seems
expensive and I am unsure of which StorageLevel is needed, Is there a way
to
Yes, your broadcast should be about 300M, much smaller than 2G, I
didn't read your post carefully.
The broadcast in Python had been improved much since 1.1, I think it
will work in 1.1 or upcoming 1.2 release, could you upgrade to 1.1?
Davies
On Tue, Nov 11, 2014 at 8:37 PM, bliuab
Hi:
I got a problem with using the union method of RDD
things like this
I get a function like
def hbaseQuery(area:string):RDD[Result]= ???
when i use hbaseQuery('aa').union(hbaseQuery(‘bb’)).count() it returns 0
however when use like this
Could you try jar tf on the assembly jar and grep
netlib-native_system-linux-x86_64.so? -Xiangrui
On Tue, Nov 11, 2014 at 7:11 PM, jpl jlefe...@soe.ucsc.edu wrote:
Hi,
I am having trouble using the BLAS libs with the MLLib functions. I am
using org.apache.spark.mllib.clustering.KMeans (on a
The Pi example gives same error in yarn mode
HADOOP_CONF_DIR=/home/gs/conf/current ./spark-submit --class
org.apache.spark.examples.SparkPi --master yarn-client
../examples/target/spark-examples_2.10-1.2.0-SNAPSHOT.jar
What could be wrong here?
--
View this message in context:
Could you provide the code of hbaseQuery? It maybe doesn't support to
execute in parallel.
Best Regards,
Shixiong Zhu
2014-11-12 14:32 GMT+08:00 qiaou qiaou8...@gmail.com:
Hi:
I got a problem with using the union method of RDD
things like this
I get a function like
def
When you calls the groupByKey() try providing the number of partitions like
groupByKey(100) depending on your data/cluster size.
Thanks
Best Regards
On Wed, Nov 12, 2014 at 6:45 AM, ankits ankitso...@gmail.com wrote:
Im running a job that uses groupByKey(), so it generates a lot of shuffle
ok here is the code
def hbaseQuery:(String)=RDD[Result] = {
val generateRdd = (area:String)={
val startRowKey = s$area${RowKeyUtils.convertToHex(startId,
10)}
val stopRowKey = s$area${RowKeyUtils.convertToHex(endId,
10)}
Hi Friends,
I am trying to save a json file to parquet. I got error Unsupported
datatype TimestampType.
Is not parquet support date? Which parquet version does spark uses? Is there
any work around?
Here the stacktrace:
java.lang.RuntimeException: Unsupported datatype TimestampType
at
1. Use foreachRDD over the dstream and on the each rdd you can call the
groupBy()
2. DStream.count() Return a new DStream in which each RDD has a single
element generated by counting each RDD of this DStream.
Thanks
Best Regards
On Wed, Nov 12, 2014 at 2:49 AM, SK skrishna...@gmail.com wrote:
Dear Liu:
Thank you for your replay. I will set up an experimental environment for
spark-1.1 and test it.
On Wed, Nov 12, 2014 at 2:30 PM, Davies Liu-2 [via Apache Spark User List]
ml-node+s1001560n1868...@n3.nabble.com wrote:
Yes, your broadcast should be about 300M, much smaller than 2G, I
this work!
but can you explain why should use like this?
--
qiaou
已使用 Sparrow (http://www.sparrowmailapp.com/?sig)
在 2014年11月12日 星期三,下午3:18,Shixiong Zhu 写道:
You need to create a new configuration for each RDD. Therefore, val
hbaseConf = HBaseConfigUtil.getHBaseConfiguration should be
Hi,
I was also trying Ispark..But I couldnt even start the notebook..I am getting
the following error.
ERROR:tornado.access:500 POST /api/sessions (127.0.0.1) 10.15ms
referer=http://localhost:/notebooks/Scala/Untitled0.ipynb
How did you start the notebook?
Thanks Regards,
Meethu M
Hi all
I have noticed that “Join” operator has been transferred to union and
groupByKey operator instead of cogroup operator in PySpark, this change
will probably generate more shuffle stage, for example
rdd1 = sc.makeRDD(...).partitionBy(2)
rdd2 = sc.makeRDD(...).partitionBy(2)
You can also build a Play 2.2.x + Spark 1.1.0 fat jar with sbt-assembly
for, e.g. yarn-client support or using with spark-shell for debugging:
play.Project.playScalaSettings
libraryDependencies ~= { _ map {
case m if m.organization == com.typesafe.play =
m.exclude(commons-logging,
The `conf` object will be sent to other nodes via Broadcast.
Here is the scaladoc of Broadcast:
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.broadcast.Broadcast
In addition, the object v should not be modified after it is broadcast in
order to ensure that all nodes
Hi Sean,
I was following this link;
http://mund-consulting.com/Blog/Posts/file-operations-in-HDFS-using-java.aspx
But, I was facing FileSystem ambiguity error. I really don't have any idea
as to how to go about doing this.
Can you please help me how to start off with this?
On Wed, Nov 12, 2014
101 - 117 of 117 matches
Mail list logo