Generally you can use |-Dsun.io.serialization.extendedDebugInfo=true| to
enable serialization debugging information when serialization exceptions
are raised.
On 12/24/14 1:32 PM, bigdata4u wrote:
I am trying to use sql over Spark streaming using Java. But i am getting
Serialization
Hao and Lam - I think the issue here is that |registerRDDAsTable| only
creates a temporary table, which is not seen by Hive metastore.
And Michael had once given a workaround for creating external Parquet
table:
Java 8 rpm 64bit downloaded from official oracle site solved my problem.
And I need not set max heap size, final memory shown at the end of maven
build was 81/1943M. I want to learn spark so have no restriction on
choosing java version.
Guru Medasani, thanks for the tip.
I will repeat info, that
That command is still wrong. It is -Xmx3g with no =.
On Dec 24, 2014 9:50 AM, Vladimir Protsenko protsenk...@gmail.com wrote:
Java 8 rpm 64bit downloaded from official oracle site solved my problem.
And I need not set max heap size, final memory shown at the end of maven
build was 81/1943M. I
this is my problem. I use mysql to store hive meta data. and i can get what
i want when I exec show tables in hive shell. but in the same machine. I
use spark-sql to execute same command (show tables), I got errors.
I look at the log of hive metastore find this errors
2014-12-24 05:04:59,874
Hi Ted,
The reference command works, but where I can get the deployable binaries?
Xiaobo Gu
-- Original --
From: Ted Yu;yuzhih...@gmail.com;
Send time: Wednesday, Dec 24, 2014 12:09 PM
To: guxiaobo1...@qq.com;
Cc:
Dear All,
We are trying to share RDDs across different sessions of same Web
application (Java). We need to share single RDD between those sessions. As
we understand from some posts, it is possible through Spark-JobServer.
Is there any guidelines you can provide to setup Spark-JobServer for Maven
Hello,
I have a piece of code that runs inside Spark Streaming and tries to get
some data from a RESTful web service (that runs locally on my machine). The
code snippet in question is:
Client client = ClientBuilder.newClient();
WebTarget target =
Hi Roc,
Spark SQL 1.2.0 can only work with Hive 0.12.0 or Hive 0.13.1
(controlled by compilation flags), versions prior 1.2.0 only works with
Hive 0.12.0. So Hive 0.15.0-SNAPSHOT is not an option.
Would like to add that this is due to backwards compatibility issue of
Hive metastore, AFAIK
Thanks. Bad mistake.
2014-12-24 14:02 GMT+04:00 Sean Owen so...@cloudera.com:
That command is still wrong. It is -Xmx3g with no =.
On Dec 24, 2014 9:50 AM, Vladimir Protsenko protsenk...@gmail.com
wrote:
Java 8 rpm 64bit downloaded from official oracle site solved my problem.
And I need
Your guess is right, that there are two incompatible versions of
Jersey (or really, JAX-RS) in your runtime. Spark doesn't use Jersey,
but its transitive dependencies may, or your transitive dependencies
may.
I don't see Jersey in Spark's dependency tree except from HBase tests,
which in turn
Hi,
have been using this without any issues with spark 1.1.0 but after upgrading to
1.2.0 saving a RDD from pyspark using saveAsNewAPIHadoopDataset into HBase just
hangs - even when testing with the example from the stock hbase_outputformat.py.
anyone having same issue? (and able to solve?)
On Wed, Dec 24, 2014 at 1:46 PM, Sean Owen so...@cloudera.com wrote:
I'd take a look with 'mvn dependency:tree' on your own code first.
Maybe you are including JavaEE 6 for example?
For reference, my complete pom.xml looks like:
project xmlns=http://maven.apache.org/POM/4.0.0; xmlns:xsi=
It seems like YARN depends an older version of Jersey, that is 1.9:
https://github.com/apache/spark/blob/master/yarn/pom.xml
When I've modified my dependencies to have only:
dependency
groupIdcom.sun.jersey/groupId
artifactIdjersey-core/artifactId
version1.9.1/version
Turns out that I was just being idiotic and had assigned so much memory to
Spark that the O/S was ending up continually swapping. Apologies for the
noise.
Phil
On Wed, Dec 24, 2014 at 1:16 AM, Andrew Ash and...@andrewash.com wrote:
Hi Phil,
This sounds a lot like a deadlock in Hadoop's
That could well be it -- oops, I forgot to run with the YARN profile
and so didn't see the YARN dependencies. Try the userClassPathFirst
option to try to make your app's copy take precedence.
The second error is really a JVM bug, but, is from having too little
memory available for the unit tests.
Sean,
Thanks a lot for the important information, especially userClassPathFirst.
Cheers,
Emre
On Wed, Dec 24, 2014 at 3:38 PM, Sean Owen so...@cloudera.com wrote:
That could well be it -- oops, I forgot to run with the YARN profile
and so didn't see the YARN dependencies. Try the
hi ,
Is there any plan to add SVDPlusPlus based recommender to MLLib ? It is
implemented in Mahout from this paper -
http://research.yahoo.com/files/kdd08koren.pdf
http://research.yahoo.com/files/kdd08koren.pdf
Regards,
Prafulla.
bq. even when testing with the example from the stock hbase_outputformat.py
Can you take jstack of the above and pastebin it ?
Thanks
On Wed, Dec 24, 2014 at 4:49 AM, Antony Mayi antonym...@yahoo.com.invalid
wrote:
Hi,
have been using this without any issues with spark 1.1.0 but after
Hi all!,
I have a RDD[(int,int,double,double)] where the first two int values are id
and product, respectively. I trained an implicit ALS algorithm and want to
make predictions from this RDD. I make two things but I think both ways are
same.
1- Convert this RDD to RDD[(int,int)] and
hey guys
One of my input records has an problem that makes the code fail.
var demoRddFilter = demoRdd.filter(line =
!line.contains(ISR$CASE$I_F_COD$FOLL_SEQ) ||
!line.contains(primaryid$caseid$caseversion))
var demoRddFilterMap = demoRddFilter.map(line = line.split('$')(0) + ~ +
DOH Looks like I did not have enough coffee before I asked this :-) I added the
if statement...var demoRddFilter = demoRdd.filter(line =
!line.contains(ISR$CASE$I_F_COD$FOLL_SEQ) ||
!line.contains(primaryid$caseid$caseversion))
var demoRddFilterMap = demoRddFilter.map(line = {
if
I don't believe that works since your map function does not return a
value for lines shorter than 13 tokens. You should use flatMap and
Some/None. (You probably want to not parse the string 5 times too.)
val demoRddFilterMap = demoRddFilter.flatMap { line =
val tokens = line.split('$')
if
Although not elegantly I got the output via my code but totally agree on the
parsing 5 times (thats really bad).Will add your suggested logic and check it
out. I have a long way to the finish line. I am re-architecting my entire
hadoop code and getting it onto spark.
Check out what I do at
I have generally been impressed with the way jsonFile eats just about any
json data model.. but getting this error when i try to ingest this file:
Unexpected close marker ']': expected '}
Here are the commands from the pyspark shell:
from pyspark.sql import HiveContext
hiveContext =
Thanks for the reply.
I am testing this with a small amount of data and what is happening is when
ever there is data in the Kafka topic Spark does not through Exception
otherwise it is.
ThanksTarun
Date: Wed, 24 Dec 2014 16:23:30 +0800
From: lian.cs@gmail.com
To: bigdat...@live.com;
Hi,
The MatrixFactorizationModel consists of two RDD's. When you use the second
method, Spark tries to serialize both RDD's for the .map() function,
which is not possible, because RDD's are not serializable. Therefore you
receive the NULLPointerException. You must use the first method.
Best,
Did you get past this issue? I¹m trying to get this to work as well. You
have to compile in the spark-ganglia-lgpl artifact into your application.
dependency
groupIdorg.apache.spark/groupId
artifactIdspark-ganglia-lgpl_2.10/artifactId
When people have questions about Spark, there are 2 main places (as far as
I can tell) where they ask them:
- Stack Overflow, under the apache-spark tag
http://stackoverflow.com/questions/tagged/apache-spark
- This mailing list
The mailing list is valuable as an independent place for
this is it (jstack of particular yarn container) - http://pastebin.com/eAdiUYKK
thanks, Antony.
On Wednesday, 24 December 2014, 16:34, Ted Yu yuzhih...@gmail.com wrote:
bq. even when testing with the example from the stock hbase_outputformat.py
Can you take jstack of the above and
I went over the jstack but didn't find any call related to hbase or
zookeeper.
Do you find anything important in the logs ?
Looks like container launcher was waiting for the script to return some
result:
1. at
Thanks
I debug this further and below is the cause
Caused by: java.io.NotSerializableException:
org.apache.spark.sql.api.java.JavaSQLContext- field (class
com.basic.spark.NumberCount$2, name: val$sqlContext, type: class
org.apache.spark.sql.api.java.JavaSQLContext)- object
I just run it by hand from pyspark shell. here is the steps:
pyspark --jars
/usr/lib/spark/lib/spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar
conf = {hbase.zookeeper.quorum: localhost,
... hbase.mapred.outputtable: test,...
mapreduce.outputformat.class:
bq. hbase.zookeeper.quorum: localhost
You are running hbase cluster in standalone mode ?
Is hbase-client jar in the classpath ?
Cheers
On Wed, Dec 24, 2014 at 4:11 PM, Antony Mayi antonym...@yahoo.com wrote:
I just run it by hand from pyspark shell. here is the steps:
pyspark --jars
Hi,
On Wed, Dec 24, 2014 at 3:18 PM, Hafiz Mujadid hafizmujadi...@gmail.com
wrote:
I want to convert a schemaRDD into RDD of String. How can we do that?
Currently I am doing like this which is not converting correctly no
exception but resultant strings are empty
here is my code
Hehe,
You might also try the following, which I think is equivalent:
schemaRDD.map(_.mkString(,))
On Wed, Dec 24, 2014 at 8:12 PM, Tobias Pfeiffer t...@preferred.jp wrote:
Hi,
On Wed, Dec 24, 2014 at 3:18 PM, Hafiz Mujadid hafizmujadi...@gmail.com
wrote:
I want to convert a schemaRDD into RDD
No, there is not. Can you open a JIRA?
On Tue, Dec 23, 2014 at 6:33 PM, Daniel Siegmann daniel.siegm...@velos.io
wrote:
I am trying to load a Parquet file which has a comma in its name. Yes,
this is a valid file name in HDFS. However, sqlContext.parquetFile
interprets this as a
The various spark contexts generally aren't serializable because you can't
use them on the executors anyway. We made SQLContext serializable just
because it gets pulled into scope more often due to the implicit
conversions its contains. You should try marking the variable that holds
the context
Each JSON object needs to be on a single line since this is the boundary
the TextFileInputFormat uses when splitting up large files.
On Wed, Dec 24, 2014 at 12:34 PM, elliott cordo elliottco...@gmail.com
wrote:
I have generally been impressed with the way jsonFile eats just about
any json data
I am running it in yarn-client mode and I believe hbase-client is part of the
spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar which I am submitting at
launch.
adding another jstack taken during the hanging - http://pastebin.com/QDQrBw70 -
this is of the CoarseGrainedExecutorBackend
also hbase itself works ok:
hbase(main):006:0 scan 'test'ROW COLUMN+CELL
key1
column=f1:asd, timestamp=1419463092904, value=456
1
Hi,
We have such requirements to save RDD output to HDFS with saveAsTextFile like
API, but need to overwrite the data if existed. I'm not sure if current Spark
support such kind of operations, or I need to check this manually?
There's a thread in mailing list discussed about this
Is it sufficient to set spark.hadoop.validateOutputSpecs to false?
http://spark.apache.org/docs/latest/configuration.html
- Patrick
On Wed, Dec 24, 2014 at 10:52 PM, Shao, Saisai saisai.s...@intel.com wrote:
Hi,
We have such requirements to save RDD output to HDFS with saveAsTextFile
like
I am wondering if we can provide more friendly API, other than configuration
for this purpose. What do you think Patrick?
Cheng Hao
-Original Message-
From: Patrick Wendell [mailto:pwend...@gmail.com]
Sent: Thursday, December 25, 2014 3:22 PM
To: Shao, Saisai
Cc: user@spark.apache.org;
So the behavior of overwriting existing directories IMO is something
we don't want to encourage. The reason why the Hadoop client has these
checks is that it's very easy for users to do unsafe things without
them. For instance, a user could overwrite an RDD that had 100
partitions with an RDD that
Thanks Patrick for your detailed explanation.
BR
Jerry
-Original Message-
From: Patrick Wendell [mailto:pwend...@gmail.com]
Sent: Thursday, December 25, 2014 3:43 PM
To: Cheng, Hao
Cc: Shao, Saisai; user@spark.apache.org; d...@spark.apache.org
Subject: Re: Question on saveAsTextFile
What options should I use when running the make-distribution.sh script,
I tried ./make-distribution.sh --hadoop.version 2.6.0 --with-yarn -with-hive
--with-tachyon --tgz
with nothing came out.
Regards
-- Original --
From: guxiaobo1982;guxiaobo1...@qq.com;
47 matches
Mail list logo