Hi,
As per the code, KryoSerialization used writeClassAndObject method, which
internally calls writeClass method, which will write the class of the
object while serilization.
As per the documentation in tuning page of spark, it says that registering
the class will avoid that.
Am I missing
Hi, in a test on SparkSQL 1.3.0, multiple threads are doing select on a
same SQLContext instance, but below exception is thrown, so it looks
like SQLContext is NOT thread safe? I think this is not the desired
behavior.
==
java.lang.RuntimeException: [1.1] failure: ``insert'' expected but
actually this is a sql parse exception, are you sure your sql is right?
发自我的 iPhone
在 2015年4月30日,18:50,Haopu Wang hw...@qilinsoft.com 写道:
Hi, in a test on SparkSQL 1.3.0, multiple threads are doing select on a
same SQLContext instance, but below exception is thrown, so it looks
like
I have reported the issue on JIRA:
https://issues.apache.org/jira/browse/SPARK-7276
On Thu, Apr 30, 2015 at 4:36 PM, alexandre Clement a.p.clem...@gmail.com
wrote:
Hi all,
I'm experimenting serious performance problem when using withColumn and
dataset with large number of columns. It is
Hi all,
I'm experimenting serious performance problem when using withColumn and
dataset with large number of columns. It is very slow: on a dataset with
100 columns it takes a few seconds.
The code snippet demonstrates the problem.
val custs = Seq(
Row(1, Bob, 21, 80.5),
Row(2, Bobby, 21,
Hi All:
Is there any plan to add drop column/s functionality in the data frame?
One can you select function to do so, but I find that tedious when only
one or two columns in large dataframe are to be dropped.
Pandas has this functionality, which I find handy when constructing feature
vectors
I filed a ticket: https://issues.apache.org/jira/browse/SPARK-7280
Would you like to give it a shot?
On Thu, Apr 30, 2015 at 10:22 AM, rakeshchalasani vnit.rak...@gmail.com
wrote:
Hi All:
Is there any plan to add drop column/s functionality in the data frame?
One can you select function to
Cody Koeninger-2 wrote
What's your schema for the offset table, and what's the definition of
writeOffset ?
The schema is the same as the one in your post: topic | partition| offset
The writeOffset is nearly identical:
def writeOffset(osr: OffsetRange)(implicit session: DBSession): Unit = {
Unfortunately, I think the SQLParser is not threadsafe. I would recommend
using HiveQL.
On Thu, Apr 30, 2015 at 4:07 AM, Wangfei (X) wangf...@huawei.com wrote:
actually this is a sql parse exception, are you sure your sql is right?
发自我的 iPhone
在 2015年4月30日,18:50,Haopu Wang
What's your schema for the offset table, and what's the definition of
writeOffset ?
What key are you reducing on? Maybe I'm misreading the code, but it looks
like the per-partition offset is part of the key. If that's true then you
could just do your reduction on each partition, rather than
IMHO I would go with choice #1
Cheers
On Wed, Apr 29, 2015 at 10:03 PM, Reynold Xin r...@databricks.com wrote:
We definitely still have the name collision problem in SQL.
On Wed, Apr 29, 2015 at 10:01 PM, Punyashloka Biswal
punya.bis...@gmail.com
wrote:
Do we still have to keep the
We're a group of experienced backend developers who are fairly new to Spark
Streaming (and Scala) and very interested in using the new (in 1.3)
DirectKafkaInputDStream impl as part of the metrics reporting service we're
building.
Our flow involves reading in metric events, lightly modifying some
Sure, I will try sending a PR soon.
On Thu, Apr 30, 2015 at 1:42 PM Reynold Xin r...@databricks.com wrote:
I filed a ticket: https://issues.apache.org/jira/browse/SPARK-7280
Would you like to give it a shot?
On Thu, Apr 30, 2015 at 10:22 AM, rakeshchalasani vnit.rak...@gmail.com
wrote:
Hi Twinkle,
Registering the class makes it so that writeClass only writes out a couple
bytes, instead of a full String of the class name.
-Sandy
On Thu, Apr 30, 2015 at 4:13 AM, twinkle sachdeva
twinkle.sachd...@gmail.com wrote:
Hi,
As per the code, KryoSerialization used
i am not sure eol means much if it is still actively used. we have a lot of
clients with centos 5 (for which we still support python 2.4 in some form
or another, fun!). most of them are on centos 6, which means python 2.6. by
cutting out python 2.6 you would cut out the majority of the actual
I understand the concern about cutting out users who still use Java 6, and
I don't have numbers about how many people are still using Java 6.
But I want to say at a high level that I support deprecating older versions
of stuff to reduce our maintenance burden and let us use more modern
patterns
something to keep in mind: we can easily support java 6 for the build
environment, particularly if there's a definite EOL.
i'd like to fix our java versioning 'problem', and this could be a big
instigator... right now we're hackily setting java_home in test invocation
on jenkins, which really
In fact, you're using the 2 arg form of reduce by key to shrink it down to
1 partition
reduceByKey(sumFunc, 1)
But you started with 4 kafka partitions? So they're definitely no longer
1:1
On Thu, Apr 30, 2015 at 1:58 PM, Cody Koeninger c...@koeninger.org wrote:
This is what I'm suggesting,
Bumping this. Anyone of you having some familiarity with py4j interface in
pyspark?
thanks
2015-04-27 22:09 GMT-07:00 Stephen Boesch java...@gmail.com:
My intention is to add pyspark support for certain mllib spark methods. I
have been unable to resolve pickling errors of the form
I'd also support this. In general, I think it's good that we try to
have Spark support different versions of things (Hadoop, Hive, etc).
But at some point you need to weigh the costs of doing so against the
number of users affected.
In the case of Java 6, we are seeing increasing cost from this.
I'm in favor of ending support for Java 6. We should also articulate a
policy on how long we want to support current and future versions of Java
after Oracle declares them EOL (Java 7 will be in that bucket in a matter
of days).
Punya
On Thu, Apr 30, 2015 at 1:18 PM shane knapp
Hi Team,
Should we take this opportunity to layout and evangelize a pattern for EOL of
dependencies.I propose, we follow the official EOL of java, python, scala,
.And add say 6-12-24 months depending on the popularity.
Java 6 official EOL Feb 2013Add 6-12 monthsAug 2013 - Feb 2014 official
Cody Koeninger-2 wrote
In fact, you're using the 2 arg form of reduce by key to shrink it down to
1 partition
reduceByKey(sumFunc, 1)
But you started with 4 kafka partitions? So they're definitely no longer
1:1
True. I added the second arg because we were seeing multiple threads
I'm firmly in favor of this.
It would also fix https://issues.apache.org/jira/browse/SPARK-7009 and
avoid any more of the long-standing 64K file limit thing that's still
a problem for PySpark.
As a point of reference, CDH5 has never supported Java 6, and it was
released over a year ago.
On Thu,
As for the idea, I'm +1. Spark is the only reason I still have jdk6
around - exactly because I don't want to cause the issue that started
this discussion (inadvertently using JDK7 APIs). And as has been
pointed out, even J7 is about to go EOL real soon.
Even Hadoop is moving away (I think 2.7
This has been discussed a few times in the past, but now Oracle has ended
support for Java 6 for over a year, I wonder if we should just drop Java 6
support.
There is one outstanding issue Tom has brought to my attention: PySpark on
YARN doesn't work well with Java 7/8, but we have an outstanding
nicholas started it! :)
for java 6 i would have said the same thing about 1 year ago: it is foolish
to drop it. but i think the time is right about now.
about half our clients are on java 7 and the other half have active plans
to migrate to it within 6 months.
On Thu, Apr 30, 2015 at 3:57 PM,
+1 on ending support for Java 6.
BTW from https://www.java.com/en/download/faq/java_7.xml :
After April 2015, Oracle will no longer post updates of Java SE 7 to its
public download sites.
On Thu, Apr 30, 2015 at 1:34 PM, Punyashloka Biswal punya.bis...@gmail.com
wrote:
I'm in favor of ending
(On that note, I think Python 2.6 should be next on the chopping block
sometime later this year, but that’s for another thread.)
(To continue the parenthetical, Python 2.6 was in fact EOL-ed in October of
2013. https://www.python.org/download/releases/2.6.9/)
On Thu, Apr 30, 2015 at 3:18 PM
Thanks for the info.
On Fri, May 1, 2015 at 12:10 AM, Sandy Ryza sandy.r...@cloudera.com wrote:
Hi Twinkle,
Registering the class makes it so that writeClass only writes out a couple
bytes, instead of a full String of the class name.
-Sandy
On Thu, Apr 30, 2015 at 4:13 AM, twinkle
Hey all,
We ran into some test failures in our internal branch (which builds
against Hive 1.1), and I narrowed it down to the fix below. I'm not
super familiar with the Hive integration code, but does this look like
a bug for other versions of Hive too?
This caused an error where some internal
FYI, after enough consideration, we the Hadoop community dropped support for
JDK 6 starting release Apache Hadoop 2.7.x.
Thanks
+Vinod
On Apr 30, 2015, at 12:02 PM, Reynold Xin r...@databricks.com wrote:
This has been discussed a few times in the past, but now Oracle has ended
support for
Hi Michael,
It would be great to see changes to make hive integration less
painful, and I can test them in our environment once you have a patch.
But I guess my question is a little more geared towards the current
code; doesn't the issue I ran into affect 1.4 and potentially earlier
versions
Any PR open for this?
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Mima-test-failure-in-the-master-branch-tp11949p11950.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Looks like this has been taken care of:
commit beeafcfd6ee1e460c4d564cd1515d8781989b422
Author: Patrick Wendell patr...@databricks.com
Date: Thu Apr 30 20:33:36 2015 -0700
Revert [SPARK-5213] [SQL] Pluggable SQL Parser Support
On Thu, Apr 30, 2015 at 7:58 PM, zhazhan
[info] spark-sql: found 1 potential binary incompatibilities (filtered 129)
[error] * method sqlParser()org.apache.spark.sql.SparkSQLParser in class
org.apache.spark.sql.SQLContext does not have a correspondent in new version
[error] filter with: ProblemFilters.excludeMissingMethodProblem
--
I reverted the patch that I think was causing this: SPARK-5213
Thanks
On Thu, Apr 30, 2015 at 7:59 PM, zhazhan zzh...@hortonworks.com wrote:
Any PR open for this?
--
View this message in context:
But it is hard to know how long customers stay with their most recent
download.
Cheers
On Thu, Apr 30, 2015 at 2:26 PM, Sree V sree_at_ch...@yahoo.com.invalid
wrote:
If there is any possibility of getting the download counts,then we can use
it as EOS criteria as well.Say, if download counts
+1 for end of support for Java 6
On Thursday, April 30, 2015 3:08 PM, Vinod Kumar Vavilapalli
vino...@hortonworks.com wrote:
FYI, after enough consideration, we the Hadoop community dropped support for
JDK 6 starting release Apache Hadoop 2.7.x.
Thanks
+Vinod
On Apr 30, 2015, at
I finally isolated the issue to be related to the ActorSystem I reuse from
SparkEnv.get.actorSystem. This ActorSystem will contain the configuration
defined in my application jar's reference.conf in both local cluster case,
and in the case I use it directly in an extension to BaseRelation's
If there is any possibility of getting the download counts,then we can use it
as EOS criteria as well.Say, if download counts are lower than 30% (or another
number) of Life time highest,then it qualifies for EOS.
Thanking you.
With Regards
Sree
On Thursday, April 30, 2015 2:22 PM, Sree
Hey Marcelo,
Thanks for the heads up! I'm currently in the process of refactoring all
of this (to separate the metadata connection from the execution side) and
as part of this I'm making the initialization of the session not lazy. It
would be great to hear if this also works for your internal
Hi,
this follows the following feature in this feature [1]
I'm trying to implement a custom persistence engine and a leader agent in
the Java environment.
vis-a-vis scala, when I implement the PersistenceEngine trait in java, I
would have to implement methods such as readPersistedData,
We should change the trait to abstract class, and then your problem will go
away.
Do you want to submit a pull request?
On Wed, Apr 29, 2015 at 11:02 PM, Niranda Perera niranda.per...@gmail.com
wrote:
Hi,
this follows the following feature in this feature [1]
I'm trying to implement a
44 matches
Mail list logo