date:20140324

Re: Announcing the official Spark Job Server repo

2014-03-24 Thread Evan Chan

Andy, doesn't Marathon handle fault tolerance amongst its apps?  ie if
you say that N instances of an app are running, and one shuts off,
then it spins up another one no?

The tricky thing was that I was planning to use Akka Cluster to
coordinate, but Mesos itself can be used to coordinate as well, which
is an overlap/ but I didn't want ot make job server HA just
reliant only on Mesos... Anyways we can discuss offline if needed.

On Thu, Mar 20, 2014 at 1:35 AM, andy petrella andy.petre...@gmail.com wrote:
 Heya,
 That's cool you've already hacked something for this in the scripts!

 I have a related question, how would it work actually. I mean, to have this
 Job Server fault tolerant using Marathon, I would guess that it will need
 to be itself a Mesos framework, and able to publish its resources needs.
 And also, for that, the Job Server has to be aware of the resources needed
 by the Spark drivers that it will run, which is not as easy to guess,
 unless it is provided by the job itself?

 I didn't checked the Job Server deep enough so it might be already the case
 (or I'm expressing something completely dumb ^^).

 For sure, we'll try to share it when we'll reach this point to deploy using
 marathon (should be planned for April)

 greetz and again, Nice Work Evan!

 Ndi

 On Wed, Mar 19, 2014 at 7:27 AM, Evan Chan e...@ooyala.com wrote:

 Andy,

 Yeah, we've thought of deploying this on Marathon ourselves, but we're
 not sure how much Mesos we're going to use yet.   (Indeed if you look
 at bin/server_start.sh, I think I set up the PORT environment var
 specifically for Marathon.)This is also why we have deploy scripts
 which package into .tar.gz, again for Mesos deployment.

 If you do try this, please let us know.  :)

 -Evan


 On Tue, Mar 18, 2014 at 3:57 PM, andy petrella andy.petre...@gmail.com
 wrote:
  tad! That's awesome.
 
  A quick question, does someone has insights regarding having such
  JobServers deployed using Marathon on Mesos?
 
  I'm thinking about an arch where Marathon would deploy and keep the Job
  Servers running along with part of the whole set of apps deployed on it
  regarding the resources needed (à la Jenkins).
 
  Any idea is welcome.
 
  Back to the news, Evan + Ooyala team: Great Job again.
 
  andy
 
  On Tue, Mar 18, 2014 at 11:39 PM, Henry Saputra henry.sapu...@gmail.com
 wrote:
 
  W00t!
 
  Thanks for releasing this, Evan.
 
  - Henry
 
  On Tue, Mar 18, 2014 at 1:51 PM, Evan Chan e...@ooyala.com wrote:
   Dear Spark developers,
  
   Ooyala is happy to announce that we have pushed our official, Spark
   0.9.0 / Scala 2.10-compatible, job server as a github repo:
  
   https://github.com/ooyala/spark-jobserver
  
   Complete with unit tests, deploy scripts, and examples.
  
   The original PR (#222) on incubator-spark is now closed.
  
   Please have a look; pull requests are very welcome.
   --
   --
   Evan Chan
   Staff Engineer
   e...@ooyala.com  |
 



 --
 --
 Evan Chan
 Staff Engineer
 e...@ooyala.com  |




-- 
--
Evan Chan
Staff Engineer
e...@ooyala.com  |

Re: Spark 0.9.1 release

2014-03-24 Thread Evan Chan

I also have a really minor fix for SPARK-1057  (upgrading fastutil),
could that also make it in?

-Evan


On Sun, Mar 23, 2014 at 11:01 PM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
 Sorry this request is coming in a bit late, but would it be possible to
 backport SPARK-979[1] to branch-0.9 ? This is the patch for randomizing
 executor offers and I would like to use this in a release sooner rather
 than later.

 Thanks
 Shivaram

 [1]
 https://github.com/apache/spark/commit/556c56689bbc32c6cec0d07b57bd3ec73ceb243e#diff-8ef3258646b0e6a4793d6ad99848eacd


 On Thu, Mar 20, 2014 at 10:18 PM, Bhaskar Dutta bhas...@gmail.com wrote:

 Thank You! We plan to test out 0.9.1 on YARN once it is out.

 Regards,
 Bhaskar

 On Fri, Mar 21, 2014 at 12:42 AM, Tom Graves tgraves...@yahoo.com wrote:

  I'll pull [SPARK-1053] Should not require SPARK_YARN_APP_JAR when running
  on YARN - JIRA and  [SPARK-1051] On Yarn, executors don't doAs as
  submitting user - JIRA in.  The pyspark one I would consider more of an
  enhancement so might not be appropriate for a point release.
 
 
   [SPARK-1053] Should not require SPARK_YARN_APP_JAR when running on YA...
  org.apache.spark.SparkException: env SPARK_YARN_APP_JAR is not set at
 
 org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:49)
  at org.apache.spark.schedule...
  View on spark-project.atlassian.net Preview by Yahoo
 
 
   [SPARK-1051] On Yarn, executors don't doAs as submitting user - JIRA
  This means that they can't write/read from files that the yarn user
  doesn't have permissions to but the submitting user does.
  View on spark-project.atlassian.net Preview by Yahoo
 
 
 
 
 
  On Thursday, March 20, 2014 1:35 PM, Bhaskar Dutta bhas...@gmail.com
  wrote:
 
  It will be great if
  SPARK-1101https://spark-project.atlassian.net/browse/SPARK-1101:
  Umbrella
  for hardening Spark on YARN can get into 0.9.1.
 
  Thanks,
  Bhaskar
 
 
  On Thu, Mar 20, 2014 at 5:37 AM, Tathagata Das
  tathagata.das1...@gmail.comwrote:
 
Hello everyone,
  
   Since the release of Spark 0.9, we have received a number of important
  bug
   fixes and we would like to make a bug-fix release of Spark 0.9.1. We
 are
   going to cut a release candidate soon and we would love it if people
 test
   it out. We have backported several bug fixes into the 0.9 and updated
  JIRA
   accordingly
  
 
 https://spark-project.atlassian.net/browse/SPARK-1275?jql=project%20in%20(SPARK%2C%20BLINKDB%2C%20MLI%2C%20MLLIB%2C%20SHARK%2C%20STREAMING%2C%20GRAPH%2C%20TACHYON)%20AND%20fixVersion%20%3D%200.9.1%20AND%20status%20in%20(Resolved%2C%20Closed)
   .
   Please let me know if there are fixes that were not backported but you
   would like to see them in 0.9.1.
  
   Thanks!
  
   TD
  
 




-- 
--
Evan Chan
Staff Engineer
e...@ooyala.com  |

Re: new Catalyst/SQL component merged into master

2014-03-24 Thread Evan Chan

Hi Michael,

Congrats, this is really neat!

What thoughts do you have regarding adding indexing support and
predicate pushdown to this SQL framework?Right now we have custom
bitmap indexing to speed up queries, so we're really curious as far as
the architectural direction.

-Evan


On Fri, Mar 21, 2014 at 11:09 AM, Michael Armbrust
mich...@databricks.com wrote:

 It will be great if there are any examples or usecases to look at ?

 There are examples in the Spark documentation.  Patrick posted and updated
 copy here so people can see them before 1.0 is released:
 http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html

 Does this feature has different usecases than shark or more cleaner as
 hive dependency is gone?

 Depending on how you use this, there is still a dependency on Hive (By
 default this is not the case.  See the above documentation for more
 details).  However, the dependency is on a stock version of Hive instead of
 one modified by the AMPLab.  Furthermore, Spark SQL has its own optimizer,
 instead of relying on the Hive optimizer.  Long term, this is going to give
 us a lot more flexibility to optimize queries specifically for the Spark
 execution engine.  We are actively porting over the best parts of shark
 (specifically the in-memory columnar representation).

 Shark still has some features that are missing in Spark SQL, including
 SharkServer (and years of testing).  Once SparkSQL graduates from Alpha
 status, it'll likely become the new backend for Shark.



-- 
--
Evan Chan
Staff Engineer
e...@ooyala.com  |

Re: Spark 0.9.1 release

2014-03-24 Thread Evan Chan

@Tathagata, the PR is here:
https://github.com/apache/spark/pull/215

On Mon, Mar 24, 2014 at 12:02 AM, Tathagata Das
tathagata.das1...@gmail.com wrote:
@Shivaram, That is a useful patch but I am bit afraid merge it in.
Randomizing the executor has performance implications, especially for Spark
Streaming. The non-randomized ordering of allocating machines to tasks was
subtly helping to speed up certain window-based shuffle operations. For
example, corresponding shuffle partitions in multiple shuffles using the
same partitioner were likely to be co-located, that is, shuffle partition 0
were likely to be on the same machine for multiple shuffles. While this is
the not a reliable mechanism to rely on, randomization may lead to
performance degradation. So I am afraid to merge this one without
understanding the consequences.

@Evan, I have already cut a release! You can submit the PR and we can merge
it branch-0.9. If we have to cut another release, then we can include it.

On Sun, Mar 23, 2014 at 11:42 PM, Evan Chan e...@ooyala.com wrote:

I also have a really minor fix for SPARK-1057 (upgrading fastutil),
could that also make it in?

-Evan

On Sun, Mar 23, 2014 at 11:01 PM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
Sorry this request is coming in a bit late, but would it be possible to
backport SPARK-979[1] to branch-0.9 ? This is the patch for randomizing
executor offers and I would like to use this in a release sooner rather
than later.

Thanks
Shivaram

[1]

https://github.com/apache/spark/commit/556c56689bbc32c6cec0d07b57bd3ec73ceb243e#diff-8ef3258646b0e6a4793d6ad99848eacd

On Thu, Mar 20, 2014 at 10:18 PM, Bhaskar Dutta bhas...@gmail.com
wrote:

Thank You! We plan to test out 0.9.1 on YARN once it is out.

Regards,
Bhaskar

On Fri, Mar 21, 2014 at 12:42 AM, Tom Graves tgraves...@yahoo.com
wrote:

I'll pull [SPARK-1053] Should not require SPARK_YARN_APP_JAR when
running
on YARN - JIRA and [SPARK-1051] On Yarn, executors don't doAs as
submitting user - JIRA in. The pyspark one I would consider more of
an
enhancement so might not be appropriate for a point release.

[SPARK-1053] Should not require SPARK_YARN_APP_JAR when running on
YA...
org.apache.spark.SparkException: env SPARK_YARN_APP_JAR is not set at

org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:49)
at org.apache.spark.schedule...
View on spark-project.atlassian.net Preview by Yahoo

[SPARK-1051] On Yarn, executors don't doAs as submitting user - JIRA
This means that they can't write/read from files that the yarn user
doesn't have permissions to but the submitting user does.
View on spark-project.atlassian.net Preview by Yahoo

On Thursday, March 20, 2014 1:35 PM, Bhaskar Dutta bhas...@gmail.com

wrote:

It will be great if
SPARK-1101https://spark-project.atlassian.net/browse/SPARK-1101:
Umbrella
for hardening Spark on YARN can get into 0.9.1.

Thanks,
Bhaskar

On Thu, Mar 20, 2014 at 5:37 AM, Tathagata Das
tathagata.das1...@gmail.comwrote:

Hello everyone,

https://spark-project.atlassian.net/browse/SPARK-1275?jql=project%20in%20(SPARK%2C%20BLINKDB%2C%20MLI%2C%20MLLIB%2C%20SHARK%2C%20STREAMING%2C%20GRAPH%2C%20TACHYON)%20AND%20fixVersion%20%3D%200.9.1%20AND%20status%20in%20(Resolved%2C%20Closed)
.
Please let me know if there are fixes that were not backported but
you
would like to see them in 0.9.1.

Thanks!

--
--
Evan Chan
Staff Engineer
e...@ooyala.com |

Re: spark jobserver

2014-03-24 Thread Evan Chan

Suhas, here is the update, which I posted to SPARK-818:

An update: we have put up the final job server here:
https://github.com/ooyala/spark-jobserver

The plan is to have a spark-contrib repo/github account and this would
be one of the first projects.

See SPARK-1283 for the ticket to track spark-contrib.

On Sat, Mar 22, 2014 at 6:15 PM, Suhas Satish suhas.sat...@gmail.com wrote:
 Any plans of integrating SPARK-818 into spark trunk ? The pull request is
 open.
 It offers spark as a service with spark jobserver running as a separate
 process.


 Thanks,
 Suhas.



-- 
--
Evan Chan
Staff Engineer
e...@ooyala.com  |

Re: Spark 0.9.1 release

2014-03-24 Thread Patrick Wendell

Hey Evan and TD,

Spark's dependency graph in a maintenance release seems potentially
harmful, especially upgrading a minor version (not just a patch
version) like this. This could affect other downstream users. For
instance, now without knowing their fastutil dependency gets bumped
and they hit some new problem in fastutil 6.5.

- Patrick

@Evan, I have already cut a release! You can submit the PR and we can merge
it branch-0.9. If we have to cut another release, then we can include it.

On Sun, Mar 23, 2014 at 11:42 PM, Evan Chan e...@ooyala.com wrote:

I also have a really minor fix for SPARK-1057 (upgrading fastutil),
could that also make it in?

-Evan

Thanks
Shivaram

[1]

https://github.com/apache/spark/commit/556c56689bbc32c6cec0d07b57bd3ec73ceb243e#diff-8ef3258646b0e6a4793d6ad99848eacd

On Thu, Mar 20, 2014 at 10:18 PM, Bhaskar Dutta bhas...@gmail.com
wrote:

Thank You! We plan to test out 0.9.1 on YARN once it is out.

Regards,
Bhaskar

On Fri, Mar 21, 2014 at 12:42 AM, Tom Graves tgraves...@yahoo.com
wrote:

[SPARK-1053] Should not require SPARK_YARN_APP_JAR when running on
YA...
org.apache.spark.SparkException: env SPARK_YARN_APP_JAR is not set at

org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:49)
at org.apache.spark.schedule...
View on spark-project.atlassian.net Preview by Yahoo

On Thursday, March 20, 2014 1:35 PM, Bhaskar Dutta bhas...@gmail.com

wrote:

It will be great if
SPARK-1101https://spark-project.atlassian.net/browse/SPARK-1101:
Umbrella
for hardening Spark on YARN can get into 0.9.1.

Thanks,
Bhaskar

On Thu, Mar 20, 2014 at 5:37 AM, Tathagata Das
tathagata.das1...@gmail.comwrote:

Hello everyone,

Thanks!

--
--
Evan Chan
Staff Engineer
e...@ooyala.com |

Re: Spark 0.9.1 release

2014-03-24 Thread Patrick Wendell

 Spark's dependency graph in a maintenance
*Modifying* Spark's dependency graph...

Re: Spark 0.9.1 release

2014-03-24 Thread Tathagata Das

Patrick, that is a good point.


On Mon, Mar 24, 2014 at 12:14 AM, Patrick Wendell pwend...@gmail.comwrote:

  Spark's dependency graph in a maintenance
 *Modifying* Spark's dependency graph...

Re: Announcing the official Spark Job Server repo

2014-03-24 Thread andy petrella

Thx for answering!
see inline for my thoughts (or misunderstanding ? ^^)

Andy, doesn't Marathon handle fault tolerance amongst its apps?  ie if
 you say that N instances of an app are running, and one shuts off,
 then it spins up another one no?

Yes indeed, but my wonder is about how to know how many instances we need?
You know, it's purely dependent of the amount of resource consumed by the
drivers, so it fluctuates with the time.
In my actual thinking, the JobServer could ask mesos for resources
depending on the amount of resources of its currently managed job list (so
themselves should be able to deliver such info). Then (perhaps) marathon
can be (hot-)tuned to maintain N+M or N-M instances depending of the
load... But maybe am I crossing some boundaries, the ones with auto-scaling
:-/



 The tricky thing was that I was planning to use Akka Cluster to
 coordinate, but Mesos itself can be used to coordinate as well, which
 is an overlap/ but I didn't want ot make job server HA just
 reliant only on Mesos...

You mean using Akka cluster to dispatch jobs on the managed (Job Server)
nodes? That's something actually interesting as well, but I guess would
require some duplicated work with what Mesos or Yarn are doing (that is
resources management) right?


 Anyways we can discuss offline if needed.

Definitively, let's stop polluting the list !!!


C ya
andy


 On Thu, Mar 20, 2014 at 1:35 AM, andy petrella andy.petre...@gmail.com
 wrote:
  Heya,
  That's cool you've already hacked something for this in the scripts!
 
  I have a related question, how would it work actually. I mean, to have
 this
  Job Server fault tolerant using Marathon, I would guess that it will need
  to be itself a Mesos framework, and able to publish its resources needs.
  And also, for that, the Job Server has to be aware of the resources
 needed
  by the Spark drivers that it will run, which is not as easy to guess,
  unless it is provided by the job itself?
 
  I didn't checked the Job Server deep enough so it might be already the
 case
  (or I'm expressing something completely dumb ^^).
 
  For sure, we'll try to share it when we'll reach this point to deploy
 using
  marathon (should be planned for April)
 
  greetz and again, Nice Work Evan!
 
  Ndi
 
  On Wed, Mar 19, 2014 at 7:27 AM, Evan Chan e...@ooyala.com wrote:
 
  Andy,
 
  Yeah, we've thought of deploying this on Marathon ourselves, but we're
  not sure how much Mesos we're going to use yet.   (Indeed if you look
  at bin/server_start.sh, I think I set up the PORT environment var
  specifically for Marathon.)This is also why we have deploy scripts
  which package into .tar.gz, again for Mesos deployment.
 
  If you do try this, please let us know.  :)
 
  -Evan
 
 
  On Tue, Mar 18, 2014 at 3:57 PM, andy petrella andy.petre...@gmail.com
 
  wrote:
   tad! That's awesome.
  
   A quick question, does someone has insights regarding having such
   JobServers deployed using Marathon on Mesos?
  
   I'm thinking about an arch where Marathon would deploy and keep the
 Job
   Servers running along with part of the whole set of apps deployed on
 it
   regarding the resources needed (à la Jenkins).
  
   Any idea is welcome.
  
   Back to the news, Evan + Ooyala team: Great Job again.
  
   andy
  
   On Tue, Mar 18, 2014 at 11:39 PM, Henry Saputra 
 henry.sapu...@gmail.com
  wrote:
  
   W00t!
  
   Thanks for releasing this, Evan.
  
   - Henry
  
   On Tue, Mar 18, 2014 at 1:51 PM, Evan Chan e...@ooyala.com wrote:
Dear Spark developers,
   
Ooyala is happy to announce that we have pushed our official, Spark
0.9.0 / Scala 2.10-compatible, job server as a github repo:
   
https://github.com/ooyala/spark-jobserver
   
Complete with unit tests, deploy scripts, and examples.
   
The original PR (#222) on incubator-spark is now closed.
   
Please have a look; pull requests are very welcome.
--
--
Evan Chan
Staff Engineer
e...@ooyala.com  |
  
 
 
 
  --
  --
  Evan Chan
  Staff Engineer
  e...@ooyala.com  |
 



 --
 --
 Evan Chan
 Staff Engineer
 e...@ooyala.com  |

Re: spark jobserver

2014-03-24 Thread Suhas Satish

Thanks a lot for this update Evan , really appreciate the effort.

On Monday, March 24, 2014, Evan Chan e...@ooyala.com wrote:

 Suhas, here is the update, which I posted to SPARK-818:

 An update: we have put up the final job server here:
 https://github.com/ooyala/spark-jobserver

 The plan is to have a spark-contrib repo/github account and this would
 be one of the first projects.

 See SPARK-1283 for the ticket to track spark-contrib.

 On Sat, Mar 22, 2014 at 6:15 PM, Suhas Satish 
 suhas.sat...@gmail.comjavascript:;
 wrote:
  Any plans of integrating SPARK-818 into spark trunk ? The pull request is
  open.
  It offers spark as a service with spark jobserver running as a separate
  process.
 
 
  Thanks,
  Suhas.



 --
 --
 Evan Chan
 Staff Engineer
 e...@ooyala.com javascript:;  |



-- 
Cheers,
Suhas.

Re: Spark 0.9.1 release

2014-03-24 Thread Evan Chan

Patrick, yes, that is indeed a risk.

On Mon, Mar 24, 2014 at 12:30 AM, Tathagata Das
tathagata.das1...@gmail.com wrote:
 Patrick, that is a good point.


 On Mon, Mar 24, 2014 at 12:14 AM, Patrick Wendell pwend...@gmail.comwrote:

  Spark's dependency graph in a maintenance
 *Modifying* Spark's dependency graph...




-- 
--
Evan Chan
Staff Engineer
e...@ooyala.com  |

Re: spark jobserver

2014-03-24 Thread Evan Chan

Suhas,

You're welcome.  We are planning to speak about the job server at the
Spark Summit by the way.

-Evan


On Mon, Mar 24, 2014 at 9:38 AM, Suhas Satish suhas.sat...@gmail.com wrote:
 Thanks a lot for this update Evan , really appreciate the effort.

 On Monday, March 24, 2014, Evan Chan e...@ooyala.com wrote:

 Suhas, here is the update, which I posted to SPARK-818:

 An update: we have put up the final job server here:
 https://github.com/ooyala/spark-jobserver

 The plan is to have a spark-contrib repo/github account and this would
 be one of the first projects.

 See SPARK-1283 for the ticket to track spark-contrib.

 On Sat, Mar 22, 2014 at 6:15 PM, Suhas Satish 
 suhas.sat...@gmail.comjavascript:;
 wrote:
  Any plans of integrating SPARK-818 into spark trunk ? The pull request is
  open.
  It offers spark as a service with spark jobserver running as a separate
  process.
 
 
  Thanks,
  Suhas.



 --
 --
 Evan Chan
 Staff Engineer
 e...@ooyala.com javascript:;  |



 --
 Cheers,
 Suhas.



-- 
--
Evan Chan
Staff Engineer
e...@ooyala.com  |

Re: new Catalyst/SQL component merged into master

2014-03-24 Thread Usman Ghani

How does it compare against Shark, and what is the future of Shark with
this new module in place?


On Sun, Mar 23, 2014 at 11:49 PM, Evan Chan e...@ooyala.com wrote:

 Hi Michael,

 Congrats, this is really neat!

 What thoughts do you have regarding adding indexing support and
 predicate pushdown to this SQL framework?Right now we have custom
 bitmap indexing to speed up queries, so we're really curious as far as
 the architectural direction.

 -Evan


 On Fri, Mar 21, 2014 at 11:09 AM, Michael Armbrust
 mich...@databricks.com wrote:
 
  It will be great if there are any examples or usecases to look at ?
 
  There are examples in the Spark documentation.  Patrick posted and
 updated
  copy here so people can see them before 1.0 is released:
 
 http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html
 
  Does this feature has different usecases than shark or more cleaner as
  hive dependency is gone?
 
  Depending on how you use this, there is still a dependency on Hive (By
  default this is not the case.  See the above documentation for more
  details).  However, the dependency is on a stock version of Hive instead
 of
  one modified by the AMPLab.  Furthermore, Spark SQL has its own
 optimizer,
  instead of relying on the Hive optimizer.  Long term, this is going to
 give
  us a lot more flexibility to optimize queries specifically for the Spark
  execution engine.  We are actively porting over the best parts of shark
  (specifically the in-memory columnar representation).
 
  Shark still has some features that are missing in Spark SQL, including
  SharkServer (and years of testing).  Once SparkSQL graduates from Alpha
  status, it'll likely become the new backend for Shark.



 --
 --
 Evan Chan
 Staff Engineer
 e...@ooyala.com  |

Re: new Catalyst/SQL component merged into master

2014-03-24 Thread Michael Armbrust

Hi Evan,

Index support is definitely something we would like to add, and it is
possible that adding support for your custom indexing solution would not be
too difficult.

We already push predicates into hive table scan operators when the
predicates are over partition keys. You can see an example of how we
collect filters and decide which can pushed into the scan using the
HiveTableScan
query planning
strategyhttps://github.com/marmbrus/spark/blob/0ae86cfcba56b700d8e7bd869379f0c663b21c1e/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L56
.

I'd like to know more about your indexing solution. Is this something
publicly available? One concern here is that the query planning code is
not considered a public API and so is likely to change quite a bit as we
improve the optimizer. Its not currently something that we plan to expose
for external components to modify.

Michael

On Sun, Mar 23, 2014 at 11:49 PM, Evan Chan e...@ooyala.com wrote:

Hi Michael,

Congrats, this is really neat!

What thoughts do you have regarding adding indexing support and
predicate pushdown to this SQL framework?Right now we have custom
bitmap indexing to speed up queries, so we're really curious as far as
the architectural direction.

-Evan

On Fri, Mar 21, 2014 at 11:09 AM, Michael Armbrust
mich...@databricks.com wrote:

It will be great if there are any examples or usecases to look at ?

There are examples in the Spark documentation. Patrick posted and
updated
copy here so people can see them before 1.0 is released:

http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html

Does this feature has different usecases than shark or more cleaner as
hive dependency is gone?

Depending on how you use this, there is still a dependency on Hive (By
default this is not the case. See the above documentation for more
details). However, the dependency is on a stock version of Hive instead
of
one modified by the AMPLab. Furthermore, Spark SQL has its own
optimizer,
instead of relying on the Hive optimizer. Long term, this is going to
give
us a lot more flexibility to optimize queries specifically for the Spark
execution engine. We are actively porting over the best parts of shark
(specifically the in-memory columnar representation).

Shark still has some features that are missing in Spark SQL, including
SharkServer (and years of testing). Once SparkSQL graduates from Alpha
status, it'll likely become the new backend for Shark.

--
--
Evan Chan
Staff Engineer
e...@ooyala.com |

Re: Spark 0.9.1 release

2014-03-24 Thread Kevin Markey

1051 is essential!
I'm not sure about the others, but anything that adds stability to
Spark/Yarn would be helpful.

Kevin Markey

On 03/20/2014 01:12 PM, Tom Graves wrote:

I'll pull [SPARK-1053] Should not require SPARK_YARN_APP_JAR when running on
YARN - JIRA and [SPARK-1051] On Yarn, executors don't doAs as submitting user
- JIRA in. The pyspark one I would consider more of an enhancement so might
not be appropriate for a point release.

[SPARK-1053] Should not require SPARK_YARN_APP_JAR when running on YA...

org.apache.spark.SparkException: env SPARK_YARN_APP_JAR is not set at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:49)
at org.apache.spark.schedule...
View on spark-project.atlassian.net Preview by Yahoo

[SPARK-1051] On Yarn, executors don't doAs as submitting user - JIRA

This means that they can't write/read from files that the yarn user doesn't
have permissions to but the submitting user does.
View on spark-project.atlassian.net Preview by Yahoo

On Thursday, March 20, 2014 1:35 PM, Bhaskar Dutta bhas...@gmail.com wrote:

It will be great if

SPARK-1101https://spark-project.atlassian.net/browse/SPARK-1101:
Umbrella
for hardening Spark on YARN can get into 0.9.1.

Thanks,
Bhaskar

On Thu, Mar 20, 2014 at 5:37 AM, Tathagata Das
tathagata.das1...@gmail.comwrote:

Hello everyone,

Since the release of Spark 0.9, we have received a number of important bug
fixes and we would like to make a bug-fix release of Spark 0.9.1. We are
going to cut a release candidate soon and we would love it if people test
it out. We have backported several bug fixes into the 0.9 and updated JIRA
accordingly
https://spark-project.atlassian.net/browse/SPARK-1275?jql=project%20in%20(SPARK%2C%20BLINKDB%2C%20MLI%2C%20MLLIB%2C%20SHARK%2C%20STREAMING%2C%20GRAPH%2C%20TACHYON)%20AND%20fixVersion%20%3D%200.9.1%20AND%20status%20in%20(Resolved%2C%20Closed)

Please let me know if there are fixes that were not backported but you
would like to see them in 0.9.1.

Thanks!

Re: Spark 0.9.1 release

2014-03-24 Thread Tathagata Das

1051 has been pulled in!

search 1051 in
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=shortlog;h=refs/heads/branch-0.9

TD

On Mon, Mar 24, 2014 at 4:26 PM, Kevin Markey kevin.mar...@oracle.com wrote:
 1051 is essential!
 I'm not sure about the others, but anything that adds stability to
 Spark/Yarn would  be helpful.
 Kevin Markey



 On 03/20/2014 01:12 PM, Tom Graves wrote:

 I'll pull [SPARK-1053] Should not require SPARK_YARN_APP_JAR when running
 on YARN - JIRA and  [SPARK-1051] On Yarn, executors don't doAs as submitting
 user - JIRA in.  The pyspark one I would consider more of an enhancement so
 might not be appropriate for a point release.

 [SPARK-1053] Should not require SPARK_YARN_APP_JAR when running on
 YA...
 org.apache.spark.SparkException: env SPARK_YARN_APP_JAR is not set at
 org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:49)
 at org.apache.spark.schedule...
 View on spark-project.atlassian.net Preview by Yahoo
   [SPARK-1051] On Yarn, executors don't doAs as submitting user - JIRA
 This means that they can't write/read from files that the yarn user
 doesn't have permissions to but the submitting user does.
 View on spark-project.atlassian.net Preview by Yahoo



 On Thursday, March 20, 2014 1:35 PM, Bhaskar Dutta bhas...@gmail.com
 wrote:
   It will be great if
 SPARK-1101https://spark-project.atlassian.net/browse/SPARK-1101:
 Umbrella
 for hardening Spark on YARN can get into 0.9.1.

 Thanks,
 Bhaskar


 On Thu, Mar 20, 2014 at 5:37 AM, Tathagata Das
 tathagata.das1...@gmail.comwrote:

Hello everyone,

 Since the release of Spark 0.9, we have received a number of important
 bug
 fixes and we would like to make a bug-fix release of Spark 0.9.1. We are
 going to cut a release candidate soon and we would love it if people test
 it out. We have backported several bug fixes into the 0.9 and updated
 JIRA
 accordingly

 https://spark-project.atlassian.net/browse/SPARK-1275?jql=project%20in%20(SPARK%2C%20BLINKDB%2C%20MLI%2C%20MLLIB%2C%20SHARK%2C%20STREAMING%2C%20GRAPH%2C%20TACHYON)%20AND%20fixVersion%20%3D%200.9.1%20AND%20status%20in%20(Resolved%2C%20Closed)

 .

 Please let me know if there are fixes that were not backported but you
 would like to see them in 0.9.1.

 Thanks!

 TD

Re: Spark 0.9.1 release

2014-03-24 Thread Kevin Markey

Is there any way that [SPARK-782] (Shade ASM) can be included? I see
that it is not currently backported to 0.9. But there is no single
issue that has caused us more grief as we integrate spark-core with
other project dependencies. There are way too many libraries out there
in addition to Spark 0.9 and before that are not well-behaved (ASM FAQ
recommends shading), including some Hive and Hadoop libraries and a
number of servlet libraries. We can't control those, but if Spark were
well behaved in this regard, it would help. Even for a maintenance
release, and even if 1.0 is only 6 weeks away!

(For those not following 782, according to Jira comments, the SBT build
shades it, but it is the Maven build that ends up in Maven Central.)

Thanks
Kevin Markey

On 03/19/2014 06:07 PM, Tathagata Das wrote:

Hello everyone,

Since the release of Spark 0.9, we have received a number of important bug
fixes and we would like to make a bug-fix release of Spark 0.9.1. We are
going to cut a release candidate soon and we would love it if people test
it out. We have backported several bug fixes into the 0.9 and updated JIRA
accordinglyhttps://spark-project.atlassian.net/browse/SPARK-1275?jql=project%20in%20(SPARK%2C%20BLINKDB%2C%20MLI%2C%20MLLIB%2C%20SHARK%2C%20STREAMING%2C%20GRAPH%2C%20TACHYON)%20AND%20fixVersion%20%3D%200.9.1%20AND%20status%20in%20(Resolved%2C%20Closed).
Please let me know if there are fixes that were not backported but you
would like to see them in 0.9.1.

Thanks!

Re: Spark 0.9.1 release

2014-03-24 Thread Tathagata Das

Hello Kevin,

A fix for SPARK-782 would definitely simplify building against Spark.
However, its possible that a fix for this issue in 0.9.1 will break
the builds (that reference spark) of existing 0.9 users, either due to
a change in the ASM version, or for being incompatible with their
current workarounds for this issue. That is not a good idea for a
maintenance release, especially when 1.0 is not too far away.

Can you (and others) elaborate more on the current workarounds that
you have for this issue? Its best to understand all the implications
of this fix.

Note that in branch 0.9, it is not fixed, neither in SBT nor in Maven.

On Mon, Mar 24, 2014 at 4:38 PM, Kevin Markey kevin.mar...@oracle.com wrote:
Is there any way that [SPARK-782] (Shade ASM) can be included? I see that
it is not currently backported to 0.9. But there is no single issue that
has caused us more grief as we integrate spark-core with other project
dependencies. There are way too many libraries out there in addition to
Spark 0.9 and before that are not well-behaved (ASM FAQ recommends shading),
including some Hive and Hadoop libraries and a number of servlet libraries.
We can't control those, but if Spark were well behaved in this regard, it
would help. Even for a maintenance release, and even if 1.0 is only 6 weeks
away!

(For those not following 782, according to Jira comments, the SBT build
shades it, but it is the Maven build that ends up in Maven Central.)

Thanks
Kevin Markey

On 03/19/2014 06:07 PM, Tathagata Das wrote:

Hello everyone,

accordinglyhttps://spark-project.atlassian.net/browse/SPARK-1275?jql=project%20in%20(SPARK%2C%20BLINKDB%2C%20MLI%2C%20MLLIB%2C%20SHARK%2C%20STREAMING%2C%20GRAPH%2C%20TACHYON)%20AND%20fixVersion%20%3D%200.9.1%20AND%20status%20in%20(Resolved%2C%20Closed).

Please let me know if there are fixes that were not backported but you
would like to see them in 0.9.1.

Thanks!

Re: Announcing the official Spark Job Server repo

Re: Spark 0.9.1 release

Re: new Catalyst/SQL component merged into master

Re: Spark 0.9.1 release

Re: spark jobserver

Re: Spark 0.9.1 release

Re: Spark 0.9.1 release

Re: Spark 0.9.1 release

Re: Announcing the official Spark Job Server repo

Re: spark jobserver

Re: Spark 0.9.1 release

Re: spark jobserver

Re: new Catalyst/SQL component merged into master

Re: new Catalyst/SQL component merged into master

Re: Spark 0.9.1 release

Re: Spark 0.9.1 release

Re: Spark 0.9.1 release

Re: Spark 0.9.1 release

18 matches

Site Navigation

Mail list logo

Footer information