Steve,
It was indeed a protocol buffers issue. I am able to build spark now.
Thanks.
On Mon, Jun 29, 2015 at 7:37 AM, Steve Loughran ste...@hortonworks.com
wrote:
On 29 Jun 2015, at 11:27, Iulian DragoČ™ iulian.dra...@typesafe.com
wrote:
On Mon, Jun 29, 2015 at 3:02 AM, Alessandro
I am building the current master branch with Scala 2.11 following these
instructions:
Building for Scala 2.11
To produce a Spark package compiled with Scala 2.11, use the -Dscala-2.11
property:
dev/change-version-to-2.11.sh
mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -DskipTests clean package
This bug still exists in Spark-1.4.0. Is there a workaround for it?
https://issues.apache.org/jira/browse/SPARK-7944
Thanks,
Alex
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
Enjoy!
Alex
On Mon, Jan 19, 2015 at 6:44 PM, Jeff Wang jingjingwang...@gmail.com
wrote:
Hi:
I would like to contribute to the code of spark. Can I join the community?
Thanks,
Jeff
All,
I'm getting out of memory exceptions in SparkSQL GROUP BY queries. I have
plenty of RAM, so I should be able to brute-force my way through, but I
can't quite figure out what memory option affects what process.
My current memory configuration is the following:
export
Regards
On Mon, Jan 19, 2015 at 11:36 AM, Alessandro Baretta
alexbare...@gmail.com wrote:
All,
I'm getting out of memory exceptions in SparkSQL GROUP BY queries. I have
plenty of RAM, so I should be able to brute-force my way through, but I
can't quite figure out what memory option affects
/scala/org/apache/spark/sql/execution/SparkStrategies.scala
In most common use cases (e.g. inner equi join), filters are pushed below
the join or into the join. Doing a cartesian product followed by a filter
is too expensive.
On Thu, Jan 15, 2015 at 7:39 AM, Alessandro Baretta alexbare
, 2015 at 7:53 AM, Alessandro Baretta alexbare...@gmail.com
wrote:
Reynold,
Thanks for the heads up. In general, I strongly oppose the use of
private to restrict access to certain parts of the API, the reason being
that I might find the need to use some of the internals of a library from
my
Reynold,
Thanks for the heads up. In general, I strongly oppose the use of private
to restrict access to certain parts of the API, the reason being that I
might find the need to use some of the internals of a library from my own
project. I find that a @DeveloperAPI annotation serves the same
, 2015, Alessandro Baretta alexbare...@gmail.com
wrote:
Cody,
Maybe I'm not getting this, but it doesn't look like this page is
describing a priority queue scheduling policy. What this section discusses
is how resources are shared between queues. A weight-1000 pool will get
1000 times more
11, 2015 at 7:36 AM, Alessandro Baretta
alexbare...@gmail.com
wrote:
Cody,
While I might be able to improve the scheduling of my jobs by using a
few
different pools with weights equal to, say, 1, 1e3 and 1e6, effectively
getting a small handful of priority classes. Still
Is it possible to specify a priority level for a job, such that the active
jobs might be scheduled in order of priority?
Alex
it on the dev
list? That's where we track issues like this. Thanks!.
- Patrick
On Wed, Dec 31, 2014 at 8:48 PM, Alessandro Baretta
alexbare...@gmail.com wrote:
Here's what the console shows:
15/01/01 01:12:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 58.0,
whose tasks have all
Here's what the console shows:
15/01/01 01:12:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 58.0,
whose tasks have all completed, from pool
15/01/01 01:12:29 INFO scheduler.DAGScheduler: Stage 58 (runJob at
ParquetTableOperations.scala:326) finished in 5493.549 s
15/01/01 01:12:29 INFO
nanoseconds now. Since passing too many flags is ugly, now I need the whole
SQLContext, so that we can put more flags there.
Thanks,
Daoyuan
*From:* Michael Armbrust [mailto:mich...@databricks.com]
*Sent:* Tuesday, December 30, 2014 10:43 AM
*To:* Alessandro Baretta
*Cc:* Wang, Daoyuan; dev
Sorry! My bad. I had stale spark jars sitting on the slave nodes...
Alex
On Tue, Dec 30, 2014 at 4:39 PM, Alessandro Baretta alexbare...@gmail.com
wrote:
Gents,
I tried #3820. It doesn't work. I'm still getting the following exceptions:
Exception in thread Thread-45
(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Any input on how to address this issue would be welcome.
Alex
On Tue, Dec 30, 2014 at 5:21 PM, Alessandro Baretta alexbare...@gmail.com
I think I might have figure it out myself. Here's a pull request for you
guys to check out:
https://github.com/apache/spark/pull/3855
I successfully tested this code on my cluster.
On Tue, Dec 30, 2014 at 11:01 PM, Alessandro Baretta alexbare...@gmail.com
wrote:
Here's a more meaningful
wrote:
Hi Alex,
I'll create JIRA SPARK-4985 for date type support in parquet, and
SPARK-4987 for timestamp type support. For decimal type, I think we only
support decimals that fits in a long.
Thanks,
Daoyuan
-Original Message-
From: Alessandro Baretta [mailto:alexbare...@gmail.com
the plan is there to make sure that whatever we do is
going to be compatible long term.
Michael
On Mon, Dec 29, 2014 at 8:13 AM, Alessandro Baretta alexbare...@gmail.com
wrote:
Daoyuan,
Thanks for creating the jiras. I need these features by... last week, so
I'd be happy to take care
How, O how can this be? Doesn't the SQLContext hold a reference to the
SparkContext?
Alex
I am building spark with sbt off of branch 1.2. I'm using the following
command:
sbt/sbt -Pyarn -Phadoop-2.3 assembly
(http://spark.apache.org/docs/latest/building-spark.html#building-with-sbt)
Although the jar file I obtain does contain the proper version of the
hadoop libraries (v. 2.4), the
PM, Alessandro Baretta alexbare...@gmail.com
wrote:
I am building spark with sbt off of branch 1.2. I'm using the following
command:
sbt/sbt -Pyarn -Phadoop-2.3 assembly
(
http://spark.apache.org/docs/latest/building-spark.html#building-with-sbt
)
Although the jar file I obtain does
Michael,
I'm having trouble storing my SchemaRDDs in Parquet format with SparkSQL,
due to my RDDs having having DateType and DecimalType fields. What would it
take to add Parquet support for these Catalyst? Are there any other
Catalyst types for which there is no Catalyst support?
Alex
Fellow Sparkers,
I'm rather puzzled at the submitJob API. I can't quite figure out how it is
supposed to be used. Is there any more documentation about it?
Also, is there any simpler way to multiplex jobs on the cluster, such as
starting multiple computations in as many threads in the driver and
On Mon, Dec 22, 2014 at 1:32 PM, Alessandro Baretta alexbare...@gmail.com
wrote:
Fellow Sparkers,
I'm rather puzzled at the submitJob API. I can't quite figure out how it
is
supposed to be used. Is there any more documentation about it?
Also, is there any simpler way to multiplex jobs
All,
I noticed that while some operations that return RDDs are very cheap, such
as map and flatMap, some are quite expensive, such as union and groupByKey.
I'm referring here to the cost of constructing the RDD scala value, not the
cost of collecting the values contained in the RDD. This does not
On December 18, 2014 at 1:04:54 AM, Alessandro Baretta (
alexbare...@gmail.com) wrote:
All,
I noticed that while some operations that return RDDs are very cheap, such
as map and flatMap, some are quite expensive, such as union and
groupByKey.
I'm referring here to the cost of constructing the RDD
, Dec 17, 2014 at 11:24 PM, Alessandro Baretta alexbare...@gmail.com
wrote:
Well, what do you suggest I run to test this? But more importantly, what
information would this give me?
On Wed, Dec 17, 2014 at 10:46 PM, Denny Lee denny.g@gmail.com wrote:
Oh, it makes sense of gsutil scans
Michael other Spark SQL junkies,
As I read through the Spark API docs, in particular those for the
org.apache.spark.sql package, I can't seem to find details about the Scala
classes representing the various SparkSQL DataTypes, for instance
DecimalType. I find DataType classes in
Hao
-Original Message-
From: Alessandro Baretta [mailto:alexbare...@gmail.com]
Sent: Friday, December 12, 2014 6:37 AM
To: Michael Armbrust; dev@spark.apache.org
Subject: Where are the docs for the SparkSQL DataTypes?
Michael other Spark SQL junkies,
As I read through the Spark
Hello,
I defined a SchemaRDD by applying a hand-crafted StructType to an RDD. Some
of the Rows in the RDD are malformed--that is, they do not conform to the
schema defined by the StructType. When running a select statement on this
SchemaRDD I would expect SparkSQL to either reject the malformed
if you try to manipulate the data, but otherwise it
will pass it though.
I have written some debugging code (developer API, not guaranteed to be
stable) though that you can use.
import org.apache.spark.sql.execution.debug._
schemaRDD.typeCheck()
On Wed, Dec 10, 2014 at 6:19 PM, Alessandro
will try to search for JIRAs or create new ones and update this thread.
-Manish
On Monday, November 17, 2014, Alessandro Baretta alexbare...@gmail.com
wrote:
Manish,
Thanks for pointing me to the relevant docs. It is unfortunate that
absolute error is not supported yet. I can't seem to find
and deviance as loss functions but I don't think anyone is planning
to work on it yet. :-)
-Manish
On Mon, Nov 17, 2014 at 11:11 AM, Alessandro Baretta
alexbare...@gmail.com wrote:
I see that, as of v. 1.1, MLLib supports regression and classification
tree
models. I assume this means
Fellow Sparkers,
I am new here and still trying to learn to crawl. Please, bear with me.
I just pulled f90ad5d from https://github.com/apache/spark.git and am
running the compile command in the sbt shell. This is the error I'm seeing:
[error]
use ?
Cheers
On Tue, Nov 4, 2014 at 2:08 PM, Alessandro Baretta alexbare...@gmail.com
wrote:
Fellow Sparkers,
I am new here and still trying to learn to crawl. Please, bear with me.
I just pulled f90ad5d from https://github.com/apache/spark.git and am
running the compile command
it because it's faster than
Maven.
Nick
On Tue, Nov 4, 2014 at 8:03 PM, Alessandro Baretta alexbare...@gmail.com
wrote:
Nicholas,
Yes, I saw them, but they refer to maven, and I'm under the impression
that sbt is the preferred way of building spark. Is indeed maven the right
way? Anyway, as per
Hello,
Is anyone open to do some consulting work on Spark in San Mateo?
Thanks.
Alex
39 matches
Mail list logo