On 14 Jul 2015, at 12:22, Ted Yu
yuzhih...@gmail.commailto:yuzhih...@gmail.com wrote:
Looking at Jenkins, master branch compiles.
Can you try the following command ?
mvn -Phive -Phadoop-2.6 -DskipTests clean package
What version of Java are you using ?
Ted, Giles has stuck in
Hi, devs
I found that the case of 'Expression.resolved !=
(Expression.childrenResolved checkInputDataTypes().isSuccess)'
occurs in the output of Analyzer.
That is, some tests in o.a.s.sql.* fail if the codes below are added in
CheckAnalysis:
The code can continue to be a good reference implementation, no matter
where it lives. In fact, it can be a better more complete one, and
easier to update.
I agree that ec2/ needs to retain some kind of pointer to the new
location. Yes, maybe a script as well that does the checkout as you
say. We
Hi all,
I'm working on an ETL task with Spark. As part of this work, I'd like to
mark records with some info such as:
1. Whether the record is good or bad (e.g, Either)
2. Originating file and lines
Part of my motivation is to prevent errors with individual records from
stopping the entire
You shouldn't get dependencies you need from Spark, right? you declare
direct dependencies. Are we talking about re-scoping or excluding this
dep from Hadoop transitively?
On Wed, Jul 15, 2015 at 7:33 PM, Gil Vernik g...@il.ibm.com wrote:
Right, it's not currently dependence in Spark.
If we
I'm considering a few approaches -- one of which is to provide new
functions like mapLeft, mapRight, filterLeft, etc.
But this all falls shorts with DataFrames. RDDs can easily be extended
from RDD[T] to RDD[Record[T]]. I guess with DataFrames, I could add
special columns?
On Wed, Jul 15, 2015
Yea - I'd just add a bunch of columns. Doesn't seem like that big of a deal.
On Wed, Jul 15, 2015 at 10:53 AM, RJ Nowling rnowl...@gmail.com wrote:
I'm considering a few approaches -- one of which is to provide new
functions like mapLeft, mapRight, filterLeft, etc.
But this all falls shorts
Right, it's not currently dependence in Spark.
If we already mention it, is it possible to make it part of current
dependence, but only for Hadoop profiles 2.4 and up?
This will solve a lot of headache to those who use Spark + OpenStack Swift
and need every time to manually edit pom.xml to add
Hi Bob,
Thanks for the email. You can select Spark as the project when you file a
JIRA ticket at https://issues.apache.org/jira/browse/SPARK
For select 1 from $table where 0=1 -- if the database's optimizer doesn't
do constant folding and short-circuit execution, could the query end up
tableExists in
spark/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcUtils.scala uses
non-standard SQL (specifically, the LIMIT keyword) to determine whether a
table exists in a JDBC data source. This will cause an exception in
many/most JDBC databases that doesn't support LIMIT keyword.
Hey everyone,
Consider the following use of spark.sql.shuffle.partitions:
case class Data(A:String = f${(math.random*1e8).toLong}%09.0f, B: String
= f${(math.random*1e8).toLong}%09.0f)
val dataFrame = (1 to 1000).map(_ = Data()).toDF
dataFrame.registerTempTable(data)
sqlContext.setConf(
Why does Spark need to depend on it? I'm missing that bit. If an
openstack artifact is needed for openstack, shouldn't openstack add
it? otherwise everybody gets it in their build.
On Wed, Jul 15, 2015 at 7:52 PM, Gil Vernik g...@il.ibm.com wrote:
I mean currently users that wish to use Spark
One related note here is that we have a Java version of this that is
an abstract class - in the doc it says that it exists more or less to
allow for binary compatibility (it says it's for Java users, but
really Scala could use this also):
Actually the java one is a concrete class.
On Wed, Jul 15, 2015 at 12:14 PM, Patrick Wendell pwend...@gmail.com wrote:
One related note here is that we have a Java version of this that is
an abstract class - in the doc it says that it exists more or less to
allow for binary compatibility (it
Or, alternatively, the bus could catch that error and ignore / log it,
instead of stopping the context...
On Wed, Jul 15, 2015 at 12:20 PM, Marcelo Vanzin van...@cloudera.com
wrote:
Hmm, the Java listener was added in 1.3, so I think it will work for my
needs.
Might be worth it to make it
It's bad that expose a trait - even though we want to mixin stuff. We
should really audit all of these and expose only abstract classes for
anything beyond an extremely simple interface. That itself however would
break binary compatibility.
On Wed, Jul 15, 2015 at 12:15 PM, Patrick Wendell
Hmm, the Java listener was added in 1.3, so I think it will work for my
needs.
Might be worth it to make it clear in the SparkListener documentation that
people should avoid using it directly. Or follow Reynold's suggestion.
On Wed, Jul 15, 2015 at 12:14 PM, Patrick Wendell pwend...@gmail.com
I mean currently users that wish to use Spark and configure Spark to use
OpenStack Swift need to manually edit pom.xml of Spark ( main, core, yarn
) and add hadoop-openstack.jar to it and then compile Spark.
My question is why not to include this dependency in Spark for Hadoop
profiles 2.4 and
Granted the 1=0 thing is ugly and assumes constant-folding support or reads
way too much data.
Submitted JIRA SPARK-9078 (thanks for pointers) and expounded on possible
solutions a little bit more there.
Cheers, and thanks, Bob
--
View this message in context:
Per recent comments on SPARK-6442, I'd recommend not working on that one
for now. Instead, even if tasks are not that interesting to you, you
should try some small tasks at first to get used to contributing. I am
quite sure we'll want to solve SPARK-3703 by May 2016; that's pretty far in
the
I haven't gotten a response on user@ yet for these questions, but these are
probably better questions for dev@ anyway, aren't they? Could somebody on dev@
please respond?
Thanks,
Jonathan
From: Jonathan Kelly jonat...@amazon.commailto:jonat...@amazon.com
Date: Wednesday, July 15, 2015 at 12:18
Hi All,
I'm happy to announce the Spark 1.4.1 maintenance release.
We recommend all users on the 1.4 branch upgrade to
this release, which contain several important bug fixes.
Download Spark 1.4.1 - http://spark.apache.org/downloads.html
Release notes -
Hi Burak,
I’ve modified my code as you suggested, however it still leads to shuffling.
Could you suggest what’s wrong with my code or provide an example code with
block matrices multiplication that preserves data locality and does not cause
shuffling?
Modified code:
import
Hi Alexander,
I just noticed the error in my logic. There will always be a shuffle due to
the `cogroup`. `join` also uses cogroup, therefore a shuffle is inevitable.
However, the reduceByKey will not cause a shuffle. I forgot about how
cogroup will try to match things, even if they don't exist.
Thanks Akhil! For the one where I change the rest client, how likely would it
be that a change like that goes thru? Would it be rejected as an uncommon
scenario? I really don't want to have this as a separate form of the branch.
Thanks,
Joel
From: Akhil Das
I attached a patch for HADOOP-12235
BTW openstack was not mentioned in the first email from Gil.
My email and Gil's second email were sent around the same moment.
Cheers
On Wed, Jul 15, 2015 at 2:06 AM, Steve Loughran ste...@hortonworks.com
wrote:
On 14 Jul 2015, at 12:22, Ted Yu
If the map-side-combine is not that necessary, given the fact that it cannot
reduce the size of data for shuffling much (do need to serialized the key for
each value), but can reduce the number of key-value pairs, and potential reduce
the number of operations later (repartition and groupby).
On
We may be able to fix this from the Spark side by adding appropriate
exclusions in our Hadoop dependencies, right? If possible, I think that we
should do this.
On Wed, Jul 15, 2015 at 7:10 AM, Ted Yu yuzhih...@gmail.com wrote:
I attached a patch for HADOOP-12235
BTW openstack was not
28 matches
Mail list logo