No, you don’t need to do anything special. Perhaps, your application is getting
stuck somewhere? If you can share your code, someone may be able to help.
Mohammed
From: James Carman [mailto:ja...@carmanconsulting.com]
Sent: Friday, May 1, 2015 5:53 AM
To: user@spark.apache.org
Subject: Exiting
Hi,
Thanks for the reply.
Hbase cli takes less than 500 ms for the same query.
I am running a simple query i.t Select * from Customers where c_id='123123'.
Why would the same query which takes 500 ms at Hbase cli end up taking around 8
secs via Spark-Sql?
I am unable t understand this.
Hi,
I am using sprak-1.2.0 and I used Kryo serialization but I get the
following excepton.
java.io.IOException: com.esotericsoftware.kryo.KryoException:
java.lang.IndexOutOfBoundsException: Index: 3448, Size: 1
I do apprecciate if anyone could tell me how I can resolve this?
best,
/Shahab
Hi All
I am not getting any mail from this community?
Hello all!!
We've been prototyping some spark applications to read messages from Kafka
topics. The application is quite simple, we use KafkaUtils.createStream to
receive a stream of CSV messages from a Kafka Topic. We parse the CSV and
count the number of messages we get in each RDD. At a
Can you try the patch from:
[SPARK-6913][SQL] Fixed java.sql.SQLException: No suitable driver found
Cheers
On Sat, Mar 28, 2015 at 12:41 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
This is from my Hive installation
-sh-4.1$ ls /apache/hive/lib | grep derby
derby-10.10.1.1.jar
val newRdd = myRdd.map(row = row ++ Array((row(1).toLong *
row(199).toLong).toString))
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-add-a-column-to-a-spark-RDD-with-many-columns-tp22729p22735.html
Sent from the Apache Spark User List mailing list
It used to exit without any problem for me. You can basically check in the
driver UI (that runs on 4040) and see what exactly its doing.
Thanks
Best Regards
On Fri, May 1, 2015 at 6:22 PM, James Carman ja...@carmanconsulting.com
wrote:
In all the examples, it seems that the spark application
Hi all!
I am trying to read hana database using spark jdbc RDD
here is my code
def readFromHana() {
val conf = new SparkConf()
conf.setAppName(test).setMaster(local)
val sc = new SparkContext(conf)
val rdd = new JdbcRDD(sc, () = {
Just use select() to create a new DataFrame with only the columns you want.
Sort of the opposite of what you want -- but you can select all but the
columns you want minus the one you don. You could even use a filter to
remove just the one column you want on the fly:
It could be.
Thanks
Best Regards
On Fri, May 1, 2015 at 9:11 PM, roy rp...@njit.edu wrote:
Hi,
I have recently enable log4j.rootCategory=WARN, console in spark
configuration. but after that spark.logConf=True has becomes ineffective.
So just want to confirm if this is because
There was a similar discussion over here
http://mail-archives.us.apache.org/mod_mbox/spark-user/201411.mbox/%3ccakz4c0s_cuo90q2jxudvx9wc4fwu033kx3-fjujytxxhr7p...@mail.gmail.com%3E
Thanks
Best Regards
On Fri, May 1, 2015 at 7:12 PM, Todd Nist tsind...@gmail.com wrote:
*Resending as I do not
Thanks Akhil,
I am trying to investigate this path. The spark is the same, but may be
there is a difference in Hadoop.
On Sat, May 2, 2015 at 6:25 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Just make sure your are having the same version of spark in your cluster
and the project's build
it seems that on Spark Streaming 1.2 the filestream API may have a bug - it
doesn't detect new files when moving or renaming them on HDFS - only when
copying them but that leads to a well known problem with .tmp files which
get removed and make spark steraming filestream throw exception
Hi gang,
I'm giving sparkR a test drive and am bummed to discover that the SparkContext
API in sparkR is only a subset of what's available in stock spark.
Specifically, I need to be able to pull data from accumulo into sparkR. I can
do it with stock spark but can't figure out how to make the
When I run my program with Spark-Submit everythink are ok. But when I try
run in satandalone mode I obtain the nex Exceptions:
((This is with
val df = sqlContext.jsonFile(./datos.json)
))
java.io.EOFException
[error] at
Here is the pull request, you may refer to this:
https://github.com/apache/spark/pull/2994
Thanks
Jerry
2015-05-01 14:38 GMT+08:00 Pavan Sudheendra pavan0...@gmail.com:
Link to the question:
http://stackoverflow.com/questions/29974017/spark-kafka-producer-not-serializable-exception
Infact, sparkConf.set(spark.whateverPropertyYouWant,Value) gets shipped
to the executors.
Thanks
Best Regards
On Fri, May 1, 2015 at 2:55 PM, Michael Ryabtsev mich...@totango.com
wrote:
Hi,
We've had a similar problem, but with log4j properties file.
The only working way we've found, was
This mailing list sees a lot of traffic every day.
With such a volume of mail, you may find it hard to find discussions you
are interested in, and if you are the one starting discussions you may
sometimes feel your mail is going into a black hole.
We can't change the nature of this mailing list
I have a list of cloudera jars which I need to provide in --jars clause,
mainly for the HiveContext functionality I am using. However, many of these
jars have version number as part of their names. This leads to an issue that
the names might change when I do a Cloudera upgrade.
Just a note here,
Just make sure your are having the same version of spark in your cluster
and the project's build file.
Thanks
Best Regards
On Fri, May 1, 2015 at 2:43 PM, Michael Ryabtsev (Totango)
mich...@totango.com wrote:
Hi everyone,
I have a spark application that works fine on a standalone Spark
it seems that on Spark Streaming 1.2 the filestream API may have a bug - it
doesn't detect new files when moving or renaming them on HDFS - only when
copying them but that leads to a well known problem with .tmp files which
get removed and make spark steraming filestream throw exception
--
View
is it working now?
On 1 May 2015 at 13:43, James King jakwebin...@gmail.com wrote:
Oops! well spotted. Many thanks Shixiong.
On Fri, May 1, 2015 at 1:25 AM, Shixiong Zhu zsxw...@gmail.com wrote:
spark.history.fs.logDirectory is for the history server. For Spark
applications, they should
Hi,
I have an RDD srdd containing (unordered-)data like this:
s1_0, s3_0, s2_1, s2_2, s3_1, s1_3, s1_2, …
What I want is (it will be much better if they could be in ascending order):
srdd_s1:
s1_0, s1_1, s1_2, …, s1_n
srdd_s2:
s2_0, s2_1, s2_2, …, s2_n
srdd_s3:
s3_0, s3_1, s3_2, …, s3_n
…
…
First of all, thank you for your replies.
I was previously doing this via normal jdbc connection and it worked
without problems. Then I liked the idea that sparksql could take care of
opening/closing the connection.
I tried also with single quotes, since that was my first guess but didn't
work.
it seems that on Spark Streaming 1.2 the filestream API may have a bug - it
doesn't detect new files when moving or renaming them on HDFS - only when
copying them but that leads to a well known problem with .tmp files which
get removed and make spark steraming filestream throw exception
--
bq. SELECT * FROM MEMBERS LIMIT ? OFFSET ?,
Have you tried dropping limit and offset clause from the above query ?
Cheers
On Fri, May 1, 2015 at 1:56 PM, Hafiz Mujadid hafizmujadi...@gmail.com
wrote:
Hi all!
I am trying to read hana database using spark jdbc RDD
here is my code
def
Hi,
How did u check no of splits in ur file. Did i run ur mr job or calculated
it.?
The formula for split size is
max(minSize, min(max size, block size)). Can u check if it satisfies ur
case.?
Thanks Regards,
Archit Thakur.
On Saturday, April 25, 2015, Wenlei Xie wenlei@gmail.com wrote:
Thanks for your reply! It is what I am after.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-add-a-column-to-a-spark-RDD-with-many-columns-tp22729p22740.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Looks like there were delays across Apache project mailing lists.
Emails are coming through now.
On May 2, 2015, at 9:14 AM, Jeetendra Gangele gangele...@gmail.com wrote:
Hi All
I am not getting any mail from this community?
I have figured it out in the meantime - simply when moving file on HDFS it
preserves its time stamp and on the other hand the spark filestream adapter
seems to care as much about filenames as timestamps - hence NEW files with
OLD time stamps will NOT be processed - yuk
The hack you can use is to
I have figured it out in the meantime - simply when moving file on HDFS it
preserves its time stamp and on the other hand the spark filestream adapter
seems to care as much about filenames as timestamps - hence NEW files with
OLD time stamps will NOT be processed - yuk
The hack you can use is to
Hi,
I am wondering if it is possible to submit, monitor kill spark applications
from another service.
I have wrote a service this:
parse user commands
translate them into understandable arguments to an already prepared Spark-SQL
application
submit the application along with arguments to
Did you look at the cogroup transformation or the cartesian transformation ?
Regards,
Olivier.
Le sam. 2 mai 2015 à 22:01, Franz Chien franzj...@gmail.com a écrit :
Hi all,
Can I group elements in RDD into different groups and let each group share
elements? For example, I have 10,000
In the upcoming 1.4.0 release, SPARK-3468 should give you better clue.
Cheers
On Fri, May 1, 2015 at 12:30 PM, Siddharth Ubale
siddharth.ub...@syncoms.com wrote:
Hi,
Thanks for the reply.
Hbase cli takes less than 500 ms for the same query.
I am running a simple query i.t Select *
Now I am running up against some other problem while trying to schedule tasks:
15/05/01 22:32:03 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.IllegalStateException: unread block data
at
I guess :
val srdd_s1 = srdd.filter(_.startsWith(s1_)).sortBy(_)
val srdd_s2 = srdd.filter(_.startsWith(s2_)).sortBy(_)
val srdd_s3 = srdd.filter(_.startsWith(s3_)).sortBy(_)
Regards,
Olivier.
Le sam. 2 mai 2015 à 22:53, Yifan LI iamyifa...@gmail.com a écrit :
Hi,
I have an RDD *srdd*
Sounds like a patch for a drop method...
Le sam. 2 mai 2015 à 21:03, dsgriffin dsgrif...@gmail.com a écrit :
Just use select() to create a new DataFrame with only the columns you want.
Sort of the opposite of what you want -- but you can select all but the
columns you want minus the one you
Can you post your code, otherwise there's not much we can do.
Regards,
Olivier.
Le sam. 2 mai 2015 à 21:15, shahab shahab.mok...@gmail.com a écrit :
Hi,
I am using sprak-1.2.0 and I used Kryo serialization but I get the
following excepton.
java.io.IOException:
This is coming in 1.4.0
https://issues.apache.org/jira/browse/SPARK-7280
On May 2, 2015, at 2:27 PM, Olivier Girardot ssab...@gmail.com wrote:
Sounds like a patch for a drop method...
Le sam. 2 mai 2015 à 21:03, dsgriffin dsgrif...@gmail.com a écrit :
Just use select() to create a new
Thanks, Olivier and Franz. :)
Best,
Yifan LI
On 02 May 2015, at 23:23, Olivier Girardot ssab...@gmail.com wrote:
I guess :
val srdd_s1 = srdd.filter(_.startsWith(s1_)).sortBy(_)
val srdd_s2 = srdd.filter(_.startsWith(s2_)).sortBy(_)
val srdd_s3 =
You could try repartitioning your listings RDD, also doing a collectAsMap
would basically bring all your data to driver, in that case you might want
to set the storage level as Memory and disk not sure that will do any help
on the driver though.
Thanks
Best Regards
On Thu, Apr 30, 2015 at 11:10
Hi all,
Can I group elements in RDD into different groups and let each group share
elements? For example, I have 10,000 elements in RDD from e1 to e1, and
I want to group and aggregate them by another mapping with size of 2000,
ex: ( (e1,e42), (e1,e554), (e3, e554)…… (2000th group))
My first
Thanks for answer. I am now trying to set HADOOP_HOME but the issues still
persists. Also, I can see only windows-utils.exe in my HADDOP_HOME, but no
WINUTILS.EXE.
I do not have hadoop installed in my system, as I am not using HDFS, but I
am using Spark 1.3.1 prebuilt with Hadoop 2.6. AM I
44 matches
Mail list logo