Yes, there's no such thing as writing a deserialized form to disk.
However there are other persistence levels that store *serialized*
forms in memory. The meaning here is that the objects are not
serialized in memory in the JVM. Of course, they are serialized on
disk.
On Sun, Aug 31, 2014 at 5:02
hi,
is there a simple example for jdbcRDD from JAVA and not scala,
trying to figure out the last parameter in the constructor of jdbcRDD
thanks
https://spark.apache.org/docs/latest/api/java/org/apache/spark/rdd/JdbcRDD.html#JdbcRDD(org.apache.spark.SparkContext,
scala.Function0, java.lang.String, long, long, int, scala.Function1,
scala.reflect.ClassTag)
I don't think there is a completely Java-friendly version of this
class. However you
I'm using CDH 5.1.0, which bundles Spark 1.0.0 with it.
Following How-to: Run a Simple Apache Spark App in CDH 5 , I tried to
submit my job in local mode, Spark Standalone mode and YARN mode. I
successfully submitted my job in local mode and Standalone mode, however, I
noticed the following
I think -1 means your application master has not been started yet.
在 2014年8月31日,23:02,Tao Xiao xiaotao.cs@gmail.com 写道:
I'm using CDH 5.1.0, which bundles Spark 1.0.0 with it.
Following How-to: Run a Simple Apache Spark App in CDH 5 , I tried to submit
my job in local mode, Spark
Is there a sample of how to do this -
I see 1.1 is out but cannot find samples of mapPartitions
A Java sample would be very useful
On Sat, Aug 30, 2014 at 10:30 AM, Matei Zaharia matei.zaha...@gmail.com
wrote:
In 1.1, you'll be able to get all of these properties using sortByKey, and
then
matei,
it is good to hear that the restriction that keys need to fit in memory no
longer applies to combineByKey. however join requiring keys to fit in
memory is still a big deal to me. does it apply to both sides of the join,
or only one (while othe other side is streaming)?
On Sat, Aug 30,
Hi Yana,
You are correct. What needs to be added is that besides RDDs being
checkpointed, metadata which represents execution of computations are also
checkpointed in Spark Streaming.
Upon driver recovery, the last batches (the ones already executed and the
ones that should have been executed
Hello friends:
I use the Cloudera/CDH5 version of Spark (v1.0.0 Spark RPMs), but the
following is also true when
using the Apache Spark distribution built against a locally installed
Hadoop/YARN installation.
The problem:
If the following directory exists, */etc/hadoop/conf/*, and the pertinent
Just a comment on the recovery part.
Is it correct to say that currently Spark Streaming recovery design does not
consider re-computations (upon metadata lineage recovery) that depend on
blocks of data of the received stream?
https://issues.apache.org/jira/browse/SPARK-1647
Just to illustrate a
I think you're saying it's looking for /foo on HDFS and not on your
local file system?
If so, I would suggest to either prefix your local paths with file:
to be unambiguous, or unset HADOOP_HOME and HADOOP_CONF_DIR
On Sun, Aug 31, 2014 at 10:17 PM, didata subscripti...@didata.us wrote:
Hello
Just to be clear, no operation requires all the keys to fit in memory, only the
values for each specific key. All the values for each individual key need to
fit, but the system can spill to disk across keys. Right now it's for both
sides of it, unless you do a broadcast join by hand with
mapPartitions just gives you an Iterator of the values in each partition, and
lets you return an Iterator of outputs. For instance, take a look at
https://github.com/apache/spark/blob/master/core/src/test/java/org/apache/spark/JavaAPISuite.java#L694.
Matei
On August 31, 2014 at 12:26:51 PM,
hi Folks
is there a function in spark like numpy digitize with discretize a
numerical variable
or even better
is there a way to use the functionality of the decission tree builder of
spark mllib which splits data into bins in such a way that the splitted
variable mostly predict the target value
Thanks Yi, I think your answers make sense.
We can see a series of messages with appMasterRpcPort: -1 followed by a
message with appMasterRpcPort: 0, perhaps that means we were waiting for
the application master to be started (appMasterRpcPort: -1), and later
the application master got started
Hi everybody!
Now I'm doing something like this:
1) User is uploading an image to server
2) Server is working with that image using of DataBase and Java + OpenCV
3) Server Returns some generated result to user
That is slow now, and if there will be many users, it will work slower and
maybe will
You could use cogroup to combine RDDs in one RDD for cross reference processing.
e.g.
a.cogroup(b). filter{case (_, (l,r)) = l.nonEmpty r.nonEmpty }. map{case
(k,(l,r)) = (k, l)}
Best Regards,
Raymond Liu
-Original Message-
From: marylucy [mailto:qaz163wsx_...@hotmail.com]
Sent:
hi, all:
I am working on hive from spark now. I use sparkSQL(HiveFormSpark) for
calculating data and save the results in hive table.
And now, I need export the results in hive table to sql server. Is
there a way to do this?
Thank you all.
try sqoop ?
What do you mean by exporting results to sql server?
On Mon, Sep 1, 2014 at 10:41 AM, churly lin chury...@gmail.com wrote:
I am working on hive from spark now. I use sparkSQL(HiveFormSpark) for
calculating data and save the results in hive table.
And now, I need export the results
19 matches
Mail list logo