date:20140908

[GitHub] spark pull request: [SPARK-3411] Improve load-balancing of concurr...

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1106#issuecomment-54931144
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20028/consoleFull)
 for   PR 1106 at commit 
[`b6560cf`](https://github.com/apache/spark/commit/b6560cff6d31d7335ed6317db6c60b5900b44802).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Build] Removed -Phive-thriftserver since this...

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2269#issuecomment-54931135
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20025/consoleFull)
 for   PR 2269 at commit 
[`08617bd`](https://github.com/apache/spark/commit/08617bd7eeadd639ff10ee1a2b6f5d37ee2123f2).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3404 [BUILD] SparkSubmitSuite fails with...

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2328#issuecomment-54931141
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20027/consoleFull)
 for   PR 2328 at commit 
[`512d782`](https://github.com/apache/spark/commit/512d7827f253b5785488578f8906c6c9adbc125d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Docs] actorStream storageLevel default is MEM...

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2319#issuecomment-54931143
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20026/consoleFull)
 for   PR 2319 at commit 
[`7b6ce68`](https://github.com/apache/spark/commit/7b6ce6895ece70a97f725941da297c96530d115f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2096][SQL] Correctly parse dot notation...

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2230#issuecomment-54931137
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20024/consoleFull)
 for   PR 2230 at commit 
[`ca43e6d`](https://github.com/apache/spark/commit/ca43e6d5fd38f859256edce1d8a8b108490516e7).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3404 [BUILD] SparkSubmitSuite fails with...

2014-09-08 Thread srowen

GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/2328

SPARK-3404 [BUILD] SparkSubmitSuite fails with "spark-submit exits with 
code 1"

This fixes the `SparkSubmitSuite` failure by setting 
`0` in the Maven build, to match the SBT build. 
This avoids a port conflict which causes failures.

(This also updates the `scalatest` plugin off of a release candidate, to 
the identical final release.)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-3404

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2328.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2328


commit 512d7827f253b5785488578f8906c6c9adbc125d
Author: Sean Owen 
Date:   2014-09-09T06:43:27Z

Set spark.ui.port=0 in Maven scalatest config to match SBT build and avoid 
SparkSubmitSuite failure due to port conflict




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3393] [SQL] add configuration template ...

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2263#issuecomment-54930604
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20023/consoleFull)
 for   PR 2263 at commit 
[`e027d23`](https://github.com/apache/spark/commit/e027d23f6fc59da8ff6173de4a8efab170de3959).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3193]output errer info when Process exi...

2014-09-08 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/2108#issuecomment-54930425
  
@andrewor14 @JoshRosen Bingo! Adding `0` to 
the Maven build makes `SparkSubmitSuite` pass for me where it failed before. 
This was already set in the SBT build. I'll submit a PR against 
https://issues.apache.org/jira/browse/SPARK-3404


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2327#issuecomment-54929543
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20020/consoleFull)
 for   PR 2327 at commit 
[`269bd78`](https://github.com/apache/spark/commit/269bd78bb3c7efb7ca24d08bade534d459a4f74a).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  class Encoder[T <: NativeType](columnType: NativeColumnType[T]) 
extends compression.Encoder[T] `
  * `  class Encoder[T <: NativeType](columnType: NativeColumnType[T]) 
extends compression.Encoder[T] `
  * `  class Encoder[T <: NativeType](columnType: NativeColumnType[T]) 
extends compression.Encoder[T] `
  * `  class Encoder extends compression.Encoder[IntegerType.type] `
  * `  class Decoder(buffer: ByteBuffer, columnType: 
NativeColumnType[IntegerType.type])`
  * `  class Encoder extends compression.Encoder[LongType.type] `
  * `  class Decoder(buffer: ByteBuffer, columnType: 
NativeColumnType[LongType.type])`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3411] Improve load-balancing of concurr...

2014-09-08 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/1106#issuecomment-54928936
  
@markhamstra  Thanks for pointing out the shuffle problem. I tested and the 
result proved that you are right on this.
As for the keeping not alive workers question, I thought the filter will 
create a copy of workers so decided to filter in iteration. It is a time-space 
tradeoff.
Now since the worker HashSet should be transfered to Seq before shuffle, we 
could put filter before shuffle as you said.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3411] Improve load-balancing of concurr...

2014-09-08 Thread markhamstra

Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/1106#discussion_r17283522
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -481,13 +481,27 @@ private[spark] class Master(
 if (state != RecoveryState.ALIVE) { return }
 
 // First schedule drivers, they take strict precedence over 
applications
-val shuffledWorkers = Random.shuffle(workers) // Randomization helps 
balance drivers
-for (worker <- shuffledWorkers if worker.state == WorkerState.ALIVE) {
-  for (driver <- List(waitingDrivers: _*)) { // iterate over a copy of 
waitingDrivers
-if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= 
driver.desc.cores) {
+// Randomization helps balance drivers
+val shuffledWorkers = Random.shuffle(workers).toArray
--- End diff --

fixed prior example


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-08 Thread bgreeven

Github user bgreeven commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-54927726
  
Thanks for your feedback. Your points are very helpful indeed.

Here is my response:

  1.  The user guide is for normal users and it should focus on how to use 
ANN. If we want to leave some notes for developers, we can append a section at 
the end.
[bgreeven]: Sure. I think the user guide needs a lot of revision anyway, 
but as you said, it is better to wait until the code is more stable to update 
the user guide.

  1.  We don't ask users to treat unit tests as demos or examples. Instead, 
we put a short code snippet in the user guide and put a complete example under 
examples/.
[bgreeven]: OK, I'll see how to convert the demo into a unit test.

  1.  GeneralizedModel and GeneralizedAlgorithm are definitely out of the 
scope of this PR and they should not live under mllib.ann. We can make a 
separate JIRA to discuss the APIs. Could you remove them in this PR?
  2.  predict outputs the prediction for the first node. Would the first 
node be the only special node? How about having predict(v) output the full 
prediction and predict(v, i) output the prediction to the i-th node?

[bgreeven]: I certainly understand your concerns on points 3 and 4. My 
reasons for adding GeneralizedModel and GeneralizedAlgorithm were, that I see 
more uses of ANNs than classification only. A LabeledPoint implementation would 
restrict the output to essentially a one dimensional value. If you want to 
learn e.g. a multidimensional function (such as in the demo), then you need 
something more general than LabeledPoint.

The architecture of taking only the first element of an output vector is 
for legacy reasons. GeneralizedLinearModel (to which GeneralizedModel was 
modelled) as well as the ClassificationModel only output a one dimensional 
output value, hence I made the interface of predict(v) the same and created a 
separate function predictV(v) to output the multidimensional result.

I think we can indeed open a second JIRA to discuss this, since I think 
there can also be other uses for multidimensional output  than just 
classification.

  1.  Could you try to use LBFGS instead of GradientDescent?
[bgreeven] Tried it, and that works too. Actually, I would like to make the 
code more flexible, to allow for replacing the optimisation function. There is 
a lot of research in (parallelisation of) training ANNs, so the future may 
bring better optimisation strategies, and it should be easy to plug those into 
the existing code.

  1.  Please replace for loops by while loops. The latter is faster in 
Scala.
[bgreeven] Makes sense. Will do so.

  1.  Please follow the Spark Code Style 
Guide 
and update the style, e.g., a. remove space after ( and before ) b. add 
ScalaDoc for all public classes and methods c. line width should be smaller 
than 100 chars (in both main and test) d. some verification code are left as 
comments, please find a way to move them to unit tests e. organize imports into 
groups and order them alphabetically within each group f. do not add return or 
; unless they are necessary
[bgreeven] OK, I can do that. B.T.W. it seems that the Spark Code Style 
Guide is missing some rules. I would be happy to volunteer expanding the Style 
Guide, also since "sbt/sbt scalastyle" enforces some rules (such as mandatory 
spaces before and after '+') that are not mentioned in the Style Guide.

  1.  Please use existing unit tests as templates. For example, please 
rename TestParallelANN to ParallelANNSuite and use LocalSparkContext and 
FunSuite for Spark setup and test assertions. Remove main() from unit tests.
[bgreeven] OK, I will look at this and see how to convert the demo to a 
unit test.

  1.  Is Parallel necessary in the name ParallelANN?
[bgreeven] Not really. Better naming is desirable indeed.

  1.  What methods and classes are public? Could they be private or package 
private? Please generate the API doc and check the public ones.
[bgreeven] Yes I found out about this too. Some classes and methods need to 
be made public, as they currently cannot be access from outside. Maybe adding a 
Scala Object as interface (as is done in Alexander's code) is indeed better.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Build] Removed -Phive-thriftserver since this...

2014-09-08 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/2269#issuecomment-54927630
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Docs] actorStream storageLevel default is MEM...

2014-09-08 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/2319#issuecomment-54927586
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3411] Improve load-balancing of concurr...

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1106#issuecomment-54927143
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20022/consoleFull)
 for   PR 1106 at commit 
[`f674e59`](https://github.com/apache/spark/commit/f674e591c11a38610ab0bd5747e61ab1f654f26c).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3393] [SQL] add configuration template ...

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2263#issuecomment-54927090
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20023/consoleFull)
 for   PR 2263 at commit 
[`e027d23`](https://github.com/apache/spark/commit/e027d23f6fc59da8ff6173de4a8efab170de3959).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3411] Improve load-balancing of concurr...

2014-09-08 Thread markhamstra

Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/1106#discussion_r17283247
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -481,13 +481,27 @@ private[spark] class Master(
 if (state != RecoveryState.ALIVE) { return }
 
 // First schedule drivers, they take strict precedence over 
applications
-val shuffledWorkers = Random.shuffle(workers) // Randomization helps 
balance drivers
-for (worker <- shuffledWorkers if worker.state == WorkerState.ALIVE) {
-  for (driver <- List(waitingDrivers: _*)) { // iterate over a copy of 
waitingDrivers
-if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= 
driver.desc.cores) {
+// Randomization helps balance drivers
+val shuffledWorkers = Random.shuffle(workers).toArray
--- End diff --

This just looks wrong or pointless to me.  Because `workers` is a HashSet, 
it's not ordered.  Logically, shuffling a Set (i.e., changing the order in 
which elements are added to the Set) just gets you the same Set back again.  I 
think that is also what actually happens with a HashSet: 
```
scala> import scala.util.Random
import scala.util.Random

scala> val aHashSet = scala.collection.mutable.HashSet(...)
aHashSet: scala.collection.mutable.HashSet[...] = Set(...)

scala> val shuffledSet = Random.shuffle(aHashSet)
shuffledSet: scala.collection.mutable.HashSet[...] = Set(...)

scala> { for (i <- 0 until aHashSet.size) yield aHashSet(i) == 
shuffledSet(i) }.forall(_ == true)
res0: Boolean = true
```
To actually accomplish the reordering, you need to convert to a Seq before 
doing the shuffling:
```scala
val shuffledWorkers = Random.shuffle(workers.toSeq)  // yields an 
ArrayBuffer
```

And I'll ask again, what is the point of including workers whose state is 
not `WorkerState.ALIVE` and going through the trouble of shuffling in these 
workers that will only be filtered out later?  I really think that what you 
want is:
```scala
val shuffledWorkers = Random.shuffle(workers.toSeq.filter(_.state == 
WorkerState.ALIVE))
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1616#issuecomment-54926751
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20021/consoleFull)
 for   PR 1616 at commit 
[`03ed3a8`](https://github.com/apache/spark/commit/03ed3a839a83fdd658ee62a389cad1805c221cc3).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3395] [SQL] DSL sometimes incorrectly r...

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2266#issuecomment-54926453
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20016/consoleFull)
 for   PR 2266 at commit 
[`7f2b6f0`](https://github.com/apache/spark/commit/7f2b6f000cdcbe00ed138e96c0cca5bd0623a705).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL][WIP] Refined Thrift server test suite

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2214#issuecomment-54926273
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20018/consoleFull)
 for   PR 2214 at commit 
[`a1ad308`](https://github.com/apache/spark/commit/a1ad308426d385f2b4b764e3750fc13512bec408).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3393] [SQL] add configuration template ...

2014-09-08 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/2263#issuecomment-54925323
  
Thank you @liancheng , I've updated the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3412] [SQL] Add 3 missing types for Row...

2014-09-08 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/2284#issuecomment-54924719
  
Thank you guys for the explanation and voting, the boxing/unboxing is quite 
annoy problem for performance. But from the normal developer point of view, the 
`Row` api is the key to interact with the SparkSQL, complete data type (11 
primitive data types currently) support (for getter / setter) may make more 
sense for people. 

And if we used the generic type here, people may confused what the 
scala/java object type is if the data type is `TimeStamp` specified via 
`schema`, and even they probably add an object of `java.security.Timestamp` for 
the data type `Timestamp`.

Sorry, probably I missed some of the original discussions for row API 
design.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3176] Implement 'ABS and 'LAST' for sql

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2099#issuecomment-54924579
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20019/consoleFull)
 for   PR 2099 at commit 
[`71d15e7`](https://github.com/apache/spark/commit/71d15e7eb757e32e6fa0c47425905f7cd58d9bee).
 * This patch **fails** unit tests.
 * This patch **does not** merge cleanly!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3437][BUILD] Support crossbuilding in m...

2014-09-08 Thread ScrapCodes

Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/2318#discussion_r17282239
  
--- Diff: assembly/pom.xml ---
@@ -26,7 +26,7 @@
   
 
   org.apache.spark
-  spark-assembly_2.10
+  spark-assembly
--- End diff --

I just checked the published pom had, just this inside it.
```xml

http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd"; 
xmlns="http://maven.apache.org/POM/4.0.0";
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";>
  4.0.0
  org.apache.spark
  spark-core_2.10
  1.2.0-SNAPSHOT
  POM was created from install:install-file

```

No dependency information, I am not sure if we can live without that. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3393] [SQL] add configuration template ...

2014-09-08 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2263#issuecomment-54923120
  
I'd agree that at least `hive-log4j.properties.template` should be good to 
have. Partly because I myself had once been confused by this a lot...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3393] [SQL] add configuration template ...

2014-09-08 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/2263#issuecomment-54922757
  
@marmbrus those files are just a hint for users if they want to change some 
of the default settings. Probably not everybody knows what files exactly they 
should put under the folder `conf/`
e.g.
`hive-log4j.properties.template` => `hive-log4j.properties` (not 
`log4j.properties` see 
`https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala#L81`)
`hive-site.xml.template` is for people familiar with `Shark`, they probably 
confused with the Hive `dependency` means. In `Shark`, `Hive` is also required 
to be installed, and the `hive-site.xml` is placed under `$HIVE/conf`.
`configuration.xsl` is the xml style file for display the `hive-site.xml` 
on browser.(http://www.w3.org/Style/XSL/WhatIsXSL.html)

I agree it's probably more clutter if we add those 3 files, but at least we 
should keep the `hive-log4j.properties.template`, or we have to change the code 
`SparkSQLCLIDriver.scala` to load `log4j.properties` instead of the 
`hive-log4j.properties`.  What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3412] [SQL] Add 3 missing types for Row...

2014-09-08 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/2284#issuecomment-54922678
  
Yea getAs[T] sounds good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

2014-09-08 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/2327#discussion_r17281621
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SpecificRow.scala
 ---
@@ -227,7 +227,9 @@ final class SpecificMutableRow(val values: 
Array[MutableValue]) extends MutableR
 new SpecificMutableRow(newValues)
   }
 
-  override def update(ordinal: Int, value: Any): Unit = 
values(ordinal).update(value)
+  override def update(ordinal: Int, value: Any) {
+if (value == null) setNullAt(ordinal) else 
values(ordinal).update(value)
+  }
--- End diff --

This change is submitted separately in #2325 as this PR may take longer 
time to finish.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3412] [SQL] Add 3 missing types for Row...

2014-09-08 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2284#issuecomment-54922492
  
@marmbrus I vote for `getAs[T](i: Int)`.

@chenghao-intel As Michael has just said, avoiding boxing cost is the major 
design rationale behind these setters/getters. Unfortunately, we haven't fully 
leverage this design and boxing still happens on some critical paths, for 
example, building/accessing in-memory columnar buffers. PR #2327 is an attempt 
to (partially) solve this problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

2014-09-08 Thread liancheng

GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/2327

[SPARK-3294][SQL] WIP: eliminates boxing costs from in-memory columnar 
storage

This is a major refactoring of the in-memory columnar storage 
implementation, aims to eliminate boxing costs as much as possible. The basic 
idea is to refactor all major interfaces into a row-based form and use them 
together with `SpecificMutableRow`. The difficult part is how to adapt all 
compression schemes, esp. `RunLengthEncoding` and `DictionaryEncoding` to this 
design. Since in-memory compression is disabled by default for now, and this PR 
should be strictly better than before no matter in-memory compression is 
enabled or not, maybe I'll finish that part in another PR.

TODO

- [ ] Benchmark
- [ ] Eliminate boxing costs in `RunLengthEncoding`
- [ ] Eliminate boxing costs in `DictionaryEncoding` (not easy to do 
without specializing `DictionaryEncoding` for every supported column type)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark prevent-boxing/unboxing

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2327.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2327


commit 7fb1ac67048114b0cf14e7d9bcbf86d544f72fa9
Author: Cheng Lian 
Date:   2014-09-08T22:14:33Z

Made some in-memory columnar storage interfaces row-based

commit e6cf2647789881d3cc7ced7a44407aa467e7a62e
Author: Cheng Lian 
Date:   2014-09-08T23:57:31Z

Removes boxing cost in IntDelta and LongDelta by providing specialized 
implementations

commit f338236d4ef14c39084df3ff23d1733eaf8cd7db
Author: Cheng Lian 
Date:   2014-09-09T01:13:10Z

Makes ColumnAccessor.extractSingle row based

commit 1d7d1443339e99d17074ef731e8fedb4985d9f63
Author: Cheng Lian 
Date:   2014-09-09T01:25:10Z

Made compression decoder row based

commit 9c5fae6987b283875bd9eaf315cdaebc06abe45a
Author: Cheng Lian 
Date:   2014-09-09T01:49:32Z

Added row based ColumnType.append/extract

commit 269bd78bb3c7efb7ca24d08bade534d459a4f74a
Author: Cheng Lian 
Date:   2014-09-09T02:12:16Z

Use SpecificMutableRow in InMemoryColumnarTableScan to avoid boxing




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL][WIP] Refined Thrift server test suite

2014-09-08 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2214#issuecomment-54921964
  
@marmbrus not ready yet, was not able to debug it since Jenkins was quite 
crazy these days. I'll remove the WIP tag once it's ready.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...

2014-09-08 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/1616#issuecomment-54921459
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3193]output errer info when Process exi...

2014-09-08 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2108#discussion_r17281106
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -869,6 +871,7 @@ private[spark] object Utils extends Logging {
 val exitCode = process.waitFor()
 stdoutThread.join()   // Wait for it to finish reading output
 if (exitCode != 0) {
+  logError(s"Process $command exited with code $exitCode: 
${output.toString}")
--- End diff --

You actually don't need `toString` here (not a huge deal)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3193]output errer info when Process exi...

2014-09-08 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2108#discussion_r17281080
  
--- Diff: core/src/test/scala/org/apache/spark/DriverSuite.scala ---
@@ -18,9 +18,9 @@
 package org.apache.spark
 
 import java.io.File
+import java.util.Properties
--- End diff --

import not used I don't think


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3193]output errer info when Process exi...

2014-09-08 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2108#discussion_r17281083
  
--- Diff: core/src/test/scala/org/apache/spark/DriverSuite.scala ---
@@ -18,9 +18,9 @@
 package org.apache.spark
 
 import java.io.File
+import java.util.Properties
 
-import org.apache.log4j.Logger
-import org.apache.log4j.Level
+import org.apache.log4j.{PropertyConfigurator, Logger, Level}
--- End diff --

same here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-09-08 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-54921279
  
Thanks, I merged this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-09-08 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1360


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL][WIP] Refined Thrift server test suite

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2214#issuecomment-54921004
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20018/consoleFull)
 for   PR 2214 at commit 
[`a1ad308`](https://github.com/apache/spark/commit/a1ad308426d385f2b4b764e3750fc13512bec408).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] rat exclude dependency-reduced-pom.xml

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2326#issuecomment-54920999
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20017/consoleFull)
 for   PR 2326 at commit 
[`860904e`](https://github.com/apache/spark/commit/860904e96c7a4e06adc80e36163891f9b6f9175d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3395] [SQL] DSL sometimes incorrectly r...

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2266#issuecomment-54921001
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20016/consoleFull)
 for   PR 2266 at commit 
[`7f2b6f0`](https://github.com/apache/spark/commit/7f2b6f000cdcbe00ed138e96c0cca5bd0623a705).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3176] Implement 'ABS and 'LAST' for sql

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2099#issuecomment-54921008
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20019/consoleFull)
 for   PR 2099 at commit 
[`71d15e7`](https://github.com/apache/spark/commit/71d15e7eb757e32e6fa0c47425905f7cd58d9bee).
 * This patch **does not** merge cleanly!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3448][SQL] Check for null in SpecificMu...

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2325#issuecomment-54920194
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20008/consoleFull)
 for   PR 2325 at commit 
[`9366c44`](https://github.com/apache/spark/commit/9366c44ad6c9f65d074b93fce96ec6c5b6b17ad6).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread koeninger

Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/1612#discussion_r17280425
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala ---
@@ -67,6 +73,32 @@ class JdbcRDD[T: ClassTag](
 }).toArray
   }
 
+  def getSchema: Seq[(String, Int, Boolean)] = {
+if (null != schema) {
+  return schema
+}
+
+val conn = getConnection()
+val stmt = conn.prepareStatement(sql)
+val metadata = stmt.getMetaData
+try {
+  if (null != stmt && ! stmt.isClosed()) {
+stmt.close()
+  }
+} catch {
+  case e: Exception => logWarning("Exception closing statement", e)
+}
+schema = Seq[(String, Int, Boolean)]()
+for(i <- 1 to metadata.getColumnCount) {
+  schema :+= (
--- End diff --

Are there any thread safety concerns regarding mutating schema here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread koeninger

Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/1612#discussion_r17280404
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala ---
@@ -67,6 +73,32 @@ class JdbcRDD[T: ClassTag](
 }).toArray
   }
 
+  def getSchema: Seq[(String, Int, Boolean)] = {
+if (null != schema) {
+  return schema
+}
+
+val conn = getConnection()
--- End diff --

Is this connection guaranteed to get closed?  It won't benefit from the 
addOnCompleteCallback below, for instance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-08 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1486#issuecomment-54919185
  
Hey I had one other thought about the design here. We can do this in a 
subsequent patch, but it could be nice to make `TaskLocation` a case class and 
have it be serializable to/from a string.

In the past we did this in locations where we had to go through a string 
but wanted to add type safety:

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockId.scala

Then you can have sub classes for each one that we need such as:
`ExecutorCacheTaskLocation`, `HDFSCacheTaskLocation`, `HostTaskLocation`.

And they can have prefixes similar to `BlockID's` and if there is no prefix 
it will be delimited as a `HostTaskLocation`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] rat exclude dependency-reduced-pom.xml

2014-09-08 Thread witgo

GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/2326

[Minor] rat exclude dependency-reduced-pom.xml



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark rat-excludes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2326.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2326


commit 860904e96c7a4e06adc80e36163891f9b6f9175d
Author: GuoQiang Li 
Date:   2014-09-09T03:09:32Z

rat exclude dependency-reduced-pom.xml




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2491] Don't handle uncaught exceptions ...

2014-09-08 Thread witgo

Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/1482#issuecomment-54917849
  
@aarondav  I understand what you mean,I will submit the relevant code 
tomorrow.
BTW,most of the OOM are present in deserialization process.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3176] Implement 'ABS and 'LAST' for sql

2014-09-08 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2099#issuecomment-54917634
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3412] [SQL] Add 3 missing types for Row...

2014-09-08 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2284#issuecomment-54917551
  
I'm not sure we want to do this.  The specific getters / setters are really 
only so that we can avoid boxing primitives.   We added getString mostly 
because at one point we were considering having some kind of internal mutable 
backing data structure here for performance (and we still might).

What do you think about something like `def getAs[T](i: Int): T = 
apply(i).asInstanceOf[T]` in `Row`?

/cc @liancheng @rxin



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread koeninger

Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/1612#discussion_r17279806
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala ---
@@ -81,8 +113,14 @@ class JdbcRDD[T: ClassTag](
   logInfo("statement fetch size set to: " + stmt.getFetchSize + " to 
force MySQL streaming ")
 }
 
-stmt.setLong(1, part.lower)
-stmt.setLong(2, part.upper)
+val parameterCount = stmt.getParameterMetaData.getParameterCount
+if (parameterCount > 0) {
--- End diff --

Not that there's anything wrong with backwards compatible 
fixes/enhancements, but a few things I noticed here:

1.  If it's a sufficiently small table that a user is only using 1 
partition, why not encourage them to query it from the driver and broadcast it?

2.  As it stands, it looks like you allow 0, 1, 2, or more ? placeholders, 
but the doc comment change only describes the 0 or 2 case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL][WIP] Refined Thrift server test suite

2014-09-08 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2214#issuecomment-54917181
  
@liancheng are we still debugging issues here? or just waiting for it to 
pass?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL][WIP] Refined Thrift server test suite

2014-09-08 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2214#issuecomment-54917161
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3395] [SQL] DSL sometimes incorrectly r...

2014-09-08 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2266#issuecomment-54917100
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3421][SQL] Allows arbitrary character i...

2014-09-08 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2291#issuecomment-54917073
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2226#issuecomment-54917022
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20005/consoleFull)
 for   PR 2226 at commit 
[`15d877b`](https://github.com/apache/spark/commit/15d877bf457eb088d271000573592053ed1a505e).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3363][SQL] Type Coercion should support...

2014-09-08 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2246#issuecomment-54917020
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/1612#discussion_r17279616
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JdbcResultSetRDDSuite.scala 
---
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import java.sql._
+
+import org.scalatest.BeforeAndAfter
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.TestSQLContext._
+
+class JdbcResultSetRDDSuite extends QueryTest with BeforeAndAfter {
+
+  before {
+Class.forName("org.apache.derby.jdbc.EmbeddedDriver")
+val conn = 
DriverManager.getConnection("jdbc:derby:target/JdbcSchemaRDDSuiteDb;create=true")
+try {
+  val create = conn.createStatement
+  create.execute("""
+CREATE TABLE FOO(
+  ID INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY (START WITH 1, 
INCREMENT BY 1),
+  DATA INTEGER
+)""")
+  create.close()
+  val insert = conn.prepareStatement("INSERT INTO FOO(DATA) VALUES(?)")
+  (1 to 100).foreach { i =>
+insert.setInt(1, i * 2)
+insert.executeUpdate
+  }
+  insert.close()
+} catch {
+  case e: SQLException if e.getSQLState == "X0Y32" =>
+// table exists
+} finally {
+  conn.close()
+}
+  }
+
+  test("basic functionality") {
+val jdbcResultSetRDD = 
jdbcResultSet("jdbc:derby:target/JdbcSchemaRDDSuiteDb", "SELECT DATA FROM FOO")
+jdbcResultSetRDD.registerAsTable("foo")
+
+checkAnswer(
+  sql("select count(*) from foo"),
+  100
+)
+checkAnswer(
+  sql("select sum(DATA) from foo"),
+  10100
+)
+  }
+
+  after {
+try {
+  DriverManager.getConnection("jdbc:derby:;shutdown=true")
+} catch {
+  case se: SQLException if se.getSQLState == "XJ015" =>
--- End diff --

Thanks!  Maybe we can add this as a comment in both places?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2489] [SQL] Parquet support for fixed_l...

2014-09-08 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/1737#issuecomment-54916874
  
@joesu, thanks for clarifying the issues with reading data from the parquet 
library.  I like the idea of adding a new field to `BinaryType`, `fixedLength: 
Option[Int]`, that could be used to distinguish these two storage 
representation.  We can have this field default to `None` so we don't break any 
existing code.  In particular, since both types are going to be represented as 
`Array[Byte]` elsewhere in the Spark SQL execution engine, this means we don't 
have to add any extra handling code.  This is purely an optimization when 
writing out data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2489] [SQL] Parquet support for fixed_l...

2014-09-08 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/1737#issuecomment-54916880
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread koeninger

Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/1612#discussion_r17279539
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JdbcResultSetRDDSuite.scala 
---
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import java.sql._
+
+import org.scalatest.BeforeAndAfter
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.TestSQLContext._
+
+class JdbcResultSetRDDSuite extends QueryTest with BeforeAndAfter {
+
+  before {
+Class.forName("org.apache.derby.jdbc.EmbeddedDriver")
+val conn = 
DriverManager.getConnection("jdbc:derby:target/JdbcSchemaRDDSuiteDb;create=true")
+try {
+  val create = conn.createStatement
+  create.execute("""
+CREATE TABLE FOO(
+  ID INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY (START WITH 1, 
INCREMENT BY 1),
+  DATA INTEGER
+)""")
+  create.close()
+  val insert = conn.prepareStatement("INSERT INTO FOO(DATA) VALUES(?)")
+  (1 to 100).foreach { i =>
+insert.setInt(1, i * 2)
+insert.executeUpdate
+  }
+  insert.close()
+} catch {
+  case e: SQLException if e.getSQLState == "X0Y32" =>
+// table exists
+} finally {
+  conn.close()
+}
+  }
+
+  test("basic functionality") {
+val jdbcResultSetRDD = 
jdbcResultSet("jdbc:derby:target/JdbcSchemaRDDSuiteDb", "SELECT DATA FROM FOO")
+jdbcResultSetRDD.registerAsTable("foo")
+
+checkAnswer(
+  sql("select count(*) from foo"),
+  100
+)
+checkAnswer(
+  sql("select sum(DATA) from foo"),
+  10100
+)
+  }
+
+  after {
+try {
+  DriverManager.getConnection("jdbc:derby:;shutdown=true")
+} catch {
+  case se: SQLException if se.getSQLState == "XJ015" =>
--- End diff --

http://db.apache.org/derby/papers/DerbyTut/embedded_intro.html

" A clean shutdown always throws SQL exception XJ015, which can be ignored. 
"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3393] [SQL] add configuration template ...

2014-09-08 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2263#issuecomment-54916301
  
Its not really clear to me what this buys over the log4j.template that is 
already there.  I'm also not sure what the xsl file is for.  Finally an empty 
hive-site.xml file only saves you from adding `` which most 
users are probably going to copy from wherever they found the hive options they 
want at.  Given that I think this might be more clutter than its worth.  What 
do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3362][SQL] bug in casewhen resolve

2014-09-08 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2245#issuecomment-54916333
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3412] [SQL] Add 3 missing types for Row...

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2284#issuecomment-54916291
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20002/consoleFull)
 for   PR 2284 at commit 
[`3644ffa`](https://github.com/apache/spark/commit/3644ffa46ac06adb0096df4f13bc03d0f3904eab).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3329: [SQL] Don't depend on Hive SET pai...

2014-09-08 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2220


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3329: [SQL] Don't depend on Hive SET pai...

2014-09-08 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2220#issuecomment-54916080
  
Thanks for cleaning this up!  Since this passed tests before I'm going to 
merge to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3176] Implement 'ABS and 'LAST' for sql

2014-09-08 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2099#issuecomment-54915989
  
Mind fixing the conflict so we can merge this?  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-08 Thread aarondav

Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/2087#issuecomment-54915960
  
I think we need some indication of the bytes being read from Hadoop. If 
this is our only current mechanism, then I think removing the code is not worth 
the behavioral regression.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/1612#issuecomment-54915903
  
Thanks for working on this!  Several people have asked for it :)

Aside from the few minor style comments, it would be great if we could add 
APIs for java and python as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/1612#discussion_r17279173
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala ---
@@ -67,6 +69,28 @@ class JdbcRDD[T: ClassTag](
 }).toArray
   }
 
+  def getSchema: Seq[(String, Int, Boolean)] = {
--- End diff --

We should probably also make this `private[spark]`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/1612#discussion_r17279150
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala ---
@@ -81,8 +113,14 @@ class JdbcRDD[T: ClassTag](
   logInfo("statement fetch size set to: " + stmt.getFetchSize + " to 
force MySQL streaming ")
 }
 
-stmt.setLong(1, part.lower)
-stmt.setLong(2, part.upper)
+val parameterCount = stmt.getParameterMetaData.getParameterCount
+if (parameterCount > 0) {
--- End diff --

Yeah, I agree.  I was annoyed when I had to work around this.  As long as 
its backwards compatible I'm okay including it here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/1612#discussion_r17279109
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JdbcResultSetRDDSuite.scala 
---
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import java.sql._
+
+import org.scalatest.BeforeAndAfter
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.TestSQLContext._
+
+class JdbcResultSetRDDSuite extends QueryTest with BeforeAndAfter {
+
+  before {
+Class.forName("org.apache.derby.jdbc.EmbeddedDriver")
+val conn = 
DriverManager.getConnection("jdbc:derby:target/JdbcSchemaRDDSuiteDb;create=true")
+try {
+  val create = conn.createStatement
+  create.execute("""
+CREATE TABLE FOO(
+  ID INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY (START WITH 1, 
INCREMENT BY 1),
+  DATA INTEGER
+)""")
+  create.close()
+  val insert = conn.prepareStatement("INSERT INTO FOO(DATA) VALUES(?)")
+  (1 to 100).foreach { i =>
+insert.setInt(1, i * 2)
+insert.executeUpdate
+  }
+  insert.close()
+} catch {
+  case e: SQLException if e.getSQLState == "X0Y32" =>
+// table exists
+} finally {
+  conn.close()
+}
+  }
+
+  test("basic functionality") {
+val jdbcResultSetRDD = 
jdbcResultSet("jdbc:derby:target/JdbcSchemaRDDSuiteDb", "SELECT DATA FROM FOO")
+jdbcResultSetRDD.registerAsTable("foo")
+
+checkAnswer(
+  sql("select count(*) from foo"),
+  100
+)
+checkAnswer(
+  sql("select sum(DATA) from foo"),
+  10100
+)
+  }
+
+  after {
+try {
+  DriverManager.getConnection("jdbc:derby:;shutdown=true")
+} catch {
+  case se: SQLException if se.getSQLState == "XJ015" =>
--- End diff --

Hmm, yeah good question... Perhaps @koeninger, who wrote the original test, 
could enlighten us?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/1612#discussion_r17279068
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcTypes.scala 
---
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import org.apache.spark.Logging
+import org.apache.spark.sql.catalyst.types._
+
+private[sql] object JdbcTypes extends Logging {
+
+  /**
+   * More about JDBC types mapped to Java types:
+   *   
http://docs.oracle.com/javase/6/docs/technotes/guides/jdbc/getstart/mapping.html#1051555
+   *
+   * Compatibility of ResultSet getter Methods defined in JDBC spec:
+   *   
http://download.oracle.com/otn-pub/jcp/jdbc-4_1-mrel-spec/jdbc4.1-fr-spec.pdf
+   *   page 211
+   */
+  def toPrimitiveDataType(jdbcType: Int): DataType =
+jdbcType match {
+  case java.sql.Types.LONGVARCHAR
+ | java.sql.Types.VARCHAR
+ | java.sql.Types.CHAR=> StringType
+  case java.sql.Types.NUMERIC
+ | java.sql.Types.DECIMAL => DecimalType
+  case java.sql.Types.BIT => BooleanType
+  case java.sql.Types.TINYINT => ByteType
+  case java.sql.Types.SMALLINT=> ShortType
+  case java.sql.Types.INTEGER => IntegerType
+  case java.sql.Types.BIGINT  => LongType
+  case java.sql.Types.REAL=> FloatType
+  case java.sql.Types.FLOAT
+ | java.sql.Types.DOUBLE  => DoubleType
+  case java.sql.Types.LONGVARBINARY
+ | java.sql.Types.VARBINARY
+ | java.sql.Types.BINARY  => BinaryType
+  // Timestamp's getter should also be able to get DATE and TIME 
according to JDBC spec
+  case java.sql.Types.TIMESTAMP
+ | java.sql.Types.DATE
+ | java.sql.Types.TIME=> TimestampType
+
+  // TODO: CLOB only works with getClob or getAscIIStream
+  // case java.sql.Types.CLOB
+
+  // TODO: BLOB only works with getBlob or getBinaryStream
+  // case java.sql.Types.BLOB
+
+  // TODO: nested types
+  // case java.sql.Types.ARRAY => ArrayType
+  // case java.sql.Types.STRUCT=> StructType
+
+  // TODO: unsupported types
+  // case java.sql.Types.DISTINCT
+  // case java.sql.Types.REF
+
+  // TODO: more about JAVA_OBJECT:
+  //   
http://docs.oracle.com/javase/6/docs/technotes/guides/jdbc/getstart/mapping.html#1038181
+  // case java.sql.Types.JAVA_OBJECT => BinaryType
+
+  case _ => sys.error(
--- End diff --

Same here: include the type that isn't supported.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2491] Don't handle uncaught exceptions ...

2014-09-08 Thread aarondav

Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/1482#issuecomment-54915526
  
Well, I mean, this is our attempt to tell people what happened:

```
execBackend.statusUpdate(taskId, TaskState.FAILED, ser.serialize(reason))
```

We cannot expect that this logic was already called once, I believe, 
because the executor may enter shutdown mode based on a different thread 
receiving an OOM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/1612#discussion_r17279050
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
@@ -205,6 +208,54 @@ class SQLContext(@transient val sparkContext: 
SparkContext)
   }
 
   /**
+   * Loads from JDBC, returning the ResultSet as a [[SchemaRDD]].
+   * It gets MetaData from ResultSet of PreparedStatement to determine the 
schema.
+   *
+   * @group userf
+   */
+  def jdbcResultSet(
--- End diff --

For the 1.2 release we are going to be focusing on adding more external 
datasources.  As part of this we are trying to change the way we add them to 
avoid SQLContext getting to large.  What do you think about adding an object, 
`org.apache.spark.sql.jdbc.JDBC` that has these methods instead of adding them 
here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/1612#discussion_r17279042
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcResultSetRDD.scala ---
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import java.sql.ResultSet
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.rdd.JdbcRDD
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.types._
+import org.apache.spark.sql.execution.{ExistingRdd, SparkLogicalPlan}
+import org.apache.spark.Logging
+
+private[sql] object JdbcResultSetRDD extends Logging {
+
+  private[sql] def inferSchema(
+  jdbcResultSet: JdbcRDD[ResultSet]): StructType = {
+StructType(createSchema(jdbcResultSet.getSchema))
+  }
+
+  private def createSchema(metaSchema: Seq[(String, Int, Boolean)]): 
Seq[StructField] = {
+metaSchema.map(e => StructField(e._1, 
JdbcTypes.toPrimitiveDataType(e._2), e._3))
+  }
+
+  private[sql] def jdbcResultSetToRow(
+  jdbcResultSet: JdbcRDD[ResultSet],
+  schema: StructType) : RDD[Row] = {
+val row = new GenericMutableRow(schema.fields.length)
+jdbcResultSet.map(asRow(_, row, schema.fields))
--- End diff --

If you are are going to reuse the row object (which is a good idea), I'd 
use `mapPartitions` instead and create the object inside of the closure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/1612#discussion_r17279021
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcResultSetRDD.scala ---
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import java.sql.ResultSet
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.rdd.JdbcRDD
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.types._
+import org.apache.spark.sql.execution.{ExistingRdd, SparkLogicalPlan}
+import org.apache.spark.Logging
+
+private[sql] object JdbcResultSetRDD extends Logging {
+
+  private[sql] def inferSchema(
+  jdbcResultSet: JdbcRDD[ResultSet]): StructType = {
+StructType(createSchema(jdbcResultSet.getSchema))
+  }
+
+  private def createSchema(metaSchema: Seq[(String, Int, Boolean)]): 
Seq[StructField] = {
+metaSchema.map(e => StructField(e._1, 
JdbcTypes.toPrimitiveDataType(e._2), e._3))
+  }
+
+  private[sql] def jdbcResultSetToRow(
+  jdbcResultSet: JdbcRDD[ResultSet],
+  schema: StructType) : RDD[Row] = {
+val row = new GenericMutableRow(schema.fields.length)
+jdbcResultSet.map(asRow(_, row, schema.fields))
+  }
+
+  private def asRow(rs: ResultSet, row: GenericMutableRow, schemaFields: 
Seq[StructField]): Row = {
+var i = 0
+while (i < schemaFields.length) {
+  schemaFields(i).dataType match {
+case StringType  => row.update(i, rs.getString(i + 1))
+case DecimalType => row.update(i, rs.getBigDecimal(i + 1))
+case BooleanType => row.update(i, rs.getBoolean(i + 1))
+case ByteType=> row.update(i, rs.getByte(i + 1))
+case ShortType   => row.update(i, rs.getShort(i + 1))
+case IntegerType => row.update(i, rs.getInt(i + 1))
+case LongType=> row.update(i, rs.getLong(i + 1))
+case FloatType   => row.update(i, rs.getFloat(i + 1))
+case DoubleType  => row.update(i, rs.getDouble(i + 1))
+case BinaryType  => row.update(i, rs.getBytes(i + 1))
+case TimestampType => row.update(i, rs.getTimestamp(i + 1))
+case _ => sys.error(
+  s"Unsupported jdbc datatype")
--- End diff --

Would be good to print what the unsupported type is.  Also, try to wrap at 
the highest syntatic level, for example:

```scala
case unsupportedType =>
  sys.error(s"Unsupported jdbc datatype: $unsupportedType")
```

(Though actually in this case I think it'll all fit on one line).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-08 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/2294#discussion_r17278884
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
@@ -197,4 +199,369 @@ private[mllib] object BLAS extends Serializable {
 throw new IllegalArgumentException(s"scal doesn't support vector 
type ${x.getClass}.")
 }
   }
+
+  // For level-3 routines, we use the native BLAS.
+  private def nativeBLAS: NetlibBLAS = {
+if (_nativeBLAS == null) {
+  _nativeBLAS = NativeBLAS
+}
+_nativeBLAS
+  }
+
+  /**
+   * C := alpha * A * B + beta * C
+   * @param transA specify whether to use matrix A, or the transpose of 
matrix A. Should be "N" or
+   *   "n" to use A, and "T" or "t" to use the transpose of A.
+   * @param transB specify whether to use matrix B, or the transpose of 
matrix B. Should be "N" or
+   *   "n" to use B, and "T" or "t" to use the transpose of B.
+   * @param alpha a scalar to scale the multiplication A * B.
+   * @param A the matrix A that will be left multiplied to B. Size of m x 
k.
+   * @param B the matrix B that will be left multiplied by A. Size of k x 
n.
+   * @param beta a scalar that can be used to scale matrix C.
+   * @param C the resulting matrix C. Size of m x n.
+   */
+  def gemm(
+  transA: Boolean,
+  transB: Boolean,
+  alpha: Double,
+  A: Matrix,
+  B: DenseMatrix,
+  beta: Double,
+  C: DenseMatrix): Unit = {
+A match {
+  case sparse: SparseMatrix =>
+gemm(transA, transB, alpha, sparse, B, beta, C)
+  case dense: DenseMatrix =>
+gemm(transA, transB, alpha, dense, B, beta, C)
+  case _ =>
+throw new IllegalArgumentException(s"gemm doesn't support matrix 
type ${A.getClass}.")
+}
+  }
+
+  /**
+   * C := alpha * A * B + beta * C
+   *
+   * @param alpha a scalar to scale the multiplication A * B.
+   * @param A the matrix A that will be left multiplied to B. Size of m x 
k.
+   * @param B the matrix B that will be left multiplied by A. Size of k x 
n.
+   * @param beta a scalar that can be used to scale matrix C.
+   * @param C the resulting matrix C. Size of m x n.
+   */
+  def gemm(
+  alpha: Double,
+  A: Matrix,
+  B: DenseMatrix,
+  beta: Double,
+  C: DenseMatrix) {
+gemm(false, false, alpha, A, B, beta, C)
+  }
+
+  /**
+   * C := alpha * A * B + beta * C
+   * For `DenseMatrix` A.
+   */
+  private def gemm(
+  transA: Boolean,
+  transB: Boolean,
+  alpha: Double,
+  A: DenseMatrix,
+  B: DenseMatrix,
+  beta: Double,
+  C: DenseMatrix) {
+val mA: Int = if (!transA) A.numRows else A.numCols
+val nB: Int = if (!transB) B.numCols else B.numRows
+val kA: Int = if (!transA) A.numCols else A.numRows
+val kB: Int = if (!transB) B.numRows else B.numCols
+val tAstr = if (!transA) "N" else "T"
+val tBstr = if (!transB) "N" else "T"
+
+require(kA == kB, s"The columns of A don't match the rows of B. A: 
$kA, B: $kB")
+require(mA == C.numRows, s"The rows of C don't match the rows of A. C: 
${C.numRows}, A: $mA")
+require(nB == C.numCols,
+  s"The columns of C don't match the columns of B. C: ${C.numCols}, A: 
$nB")
+
+nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, 
B.values, B.numRows,
+  beta, C.values, C.numRows)
+  }
+
+  /**
+   * C := alpha * A * B + beta * C
+   * For `SparseMatrix` A.
+   */
+  private def gemm(
+  transA: Boolean,
+  transB: Boolean,
+  alpha: Double,
+  A: SparseMatrix,
+  B: DenseMatrix,
+  beta: Double,
+  C: DenseMatrix): Unit = {
+val mA: Int = if (!transA) A.numRows else A.numCols
+val nB: Int = if (!transB) B.numCols else B.numRows
+val kA: Int = if (!transA) A.numCols else A.numRows
+val kB: Int = if (!transB) B.numRows else B.numCols
+
+require(kA == kB, s"The columns of A don't match the rows of B. A: 
$kA, B: $kB")
+require(mA == C.numRows, s"The rows of C don't match the rows of A. C: 
${C.numRows}, A: $mA")
+require(nB == C.numCols,
+  s"The columns of C don't match the columns of B. C: ${C.numCols}, A: 
$nB")
+
+val Avals = A.values
+val Arows = if (!transA) A.rowIndices else A.colPtrs
+val Acols = if (!transA) A.colPtrs else A.rowIndices
+
+// Slicing is easy in this case. This is the optimal multiplication 
setting for sparse matrices
+if (transA){
+  var colCounterForB = 0
+  if (!

[GitHub] spark pull request: [SPARK-3447][SQL] Remove explicit conversion w...

2014-09-08 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/2323#issuecomment-54914904
  
`JsonRDD` and the Java API of `Row` are also using wrappers. Should we also 
check if these places will also trigger the NPE?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-08 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/2294#discussion_r17278854
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
@@ -197,4 +199,369 @@ private[mllib] object BLAS extends Serializable {
 throw new IllegalArgumentException(s"scal doesn't support vector 
type ${x.getClass}.")
 }
   }
+
+  // For level-3 routines, we use the native BLAS.
+  private def nativeBLAS: NetlibBLAS = {
+if (_nativeBLAS == null) {
+  _nativeBLAS = NativeBLAS
+}
+_nativeBLAS
+  }
+
+  /**
+   * C := alpha * A * B + beta * C
+   * @param transA specify whether to use matrix A, or the transpose of 
matrix A. Should be "N" or
+   *   "n" to use A, and "T" or "t" to use the transpose of A.
+   * @param transB specify whether to use matrix B, or the transpose of 
matrix B. Should be "N" or
+   *   "n" to use B, and "T" or "t" to use the transpose of B.
+   * @param alpha a scalar to scale the multiplication A * B.
+   * @param A the matrix A that will be left multiplied to B. Size of m x 
k.
+   * @param B the matrix B that will be left multiplied by A. Size of k x 
n.
+   * @param beta a scalar that can be used to scale matrix C.
+   * @param C the resulting matrix C. Size of m x n.
+   */
+  def gemm(
+  transA: Boolean,
+  transB: Boolean,
+  alpha: Double,
+  A: Matrix,
+  B: DenseMatrix,
+  beta: Double,
+  C: DenseMatrix): Unit = {
+A match {
+  case sparse: SparseMatrix =>
+gemm(transA, transB, alpha, sparse, B, beta, C)
+  case dense: DenseMatrix =>
+gemm(transA, transB, alpha, dense, B, beta, C)
+  case _ =>
+throw new IllegalArgumentException(s"gemm doesn't support matrix 
type ${A.getClass}.")
+}
+  }
+
+  /**
+   * C := alpha * A * B + beta * C
+   *
+   * @param alpha a scalar to scale the multiplication A * B.
+   * @param A the matrix A that will be left multiplied to B. Size of m x 
k.
+   * @param B the matrix B that will be left multiplied by A. Size of k x 
n.
+   * @param beta a scalar that can be used to scale matrix C.
+   * @param C the resulting matrix C. Size of m x n.
+   */
+  def gemm(
+  alpha: Double,
+  A: Matrix,
+  B: DenseMatrix,
+  beta: Double,
+  C: DenseMatrix) {
+gemm(false, false, alpha, A, B, beta, C)
+  }
+
+  /**
+   * C := alpha * A * B + beta * C
+   * For `DenseMatrix` A.
+   */
+  private def gemm(
+  transA: Boolean,
+  transB: Boolean,
+  alpha: Double,
+  A: DenseMatrix,
+  B: DenseMatrix,
+  beta: Double,
+  C: DenseMatrix) {
+val mA: Int = if (!transA) A.numRows else A.numCols
+val nB: Int = if (!transB) B.numCols else B.numRows
+val kA: Int = if (!transA) A.numCols else A.numRows
+val kB: Int = if (!transB) B.numRows else B.numCols
+val tAstr = if (!transA) "N" else "T"
+val tBstr = if (!transB) "N" else "T"
+
+require(kA == kB, s"The columns of A don't match the rows of B. A: 
$kA, B: $kB")
+require(mA == C.numRows, s"The rows of C don't match the rows of A. C: 
${C.numRows}, A: $mA")
+require(nB == C.numCols,
+  s"The columns of C don't match the columns of B. C: ${C.numCols}, A: 
$nB")
+
+nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, 
B.values, B.numRows,
+  beta, C.values, C.numRows)
+  }
+
+  /**
+   * C := alpha * A * B + beta * C
+   * For `SparseMatrix` A.
+   */
+  private def gemm(
+  transA: Boolean,
+  transB: Boolean,
+  alpha: Double,
+  A: SparseMatrix,
+  B: DenseMatrix,
+  beta: Double,
+  C: DenseMatrix): Unit = {
+val mA: Int = if (!transA) A.numRows else A.numCols
+val nB: Int = if (!transB) B.numCols else B.numRows
+val kA: Int = if (!transA) A.numCols else A.numRows
+val kB: Int = if (!transB) B.numRows else B.numCols
+
+require(kA == kB, s"The columns of A don't match the rows of B. A: 
$kA, B: $kB")
+require(mA == C.numRows, s"The rows of C don't match the rows of A. C: 
${C.numRows}, A: $mA")
+require(nB == C.numCols,
+  s"The columns of C don't match the columns of B. C: ${C.numCols}, A: 
$nB")
+
+val Avals = A.values
+val Arows = if (!transA) A.rowIndices else A.colPtrs
+val Acols = if (!transA) A.colPtrs else A.rowIndices
+
+// Slicing is easy in this case. This is the optimal multiplication 
setting for sparse matrices
+if (transA){
+  var colCounterForB = 0
+  if (!

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/1612#discussion_r17278868
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala ---
@@ -67,6 +69,28 @@ class JdbcRDD[T: ClassTag](
 }).toArray
   }
 
+  def getSchema: Seq[(String, Int, Boolean)] = {
--- End diff --

Can you add this as a comment here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...

2014-09-08 Thread li-zhihui

Github user li-zhihui commented on a diff in the pull request:

https://github.com/apache/spark/pull/1616#discussion_r17278861
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -313,14 +313,74 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file requested by the executor . Supports fetching the 
file in a variety of ways,
+   * including HTTP, HDFS and files on a standard filesystem, based on the 
URL parameter.
+   *
+   * If `useCache` is true, first attempts to fetch the file from a local 
cache that's shared across
+   * executors running the same application.
+   *
+   * Throws SparkException if the target file already exists and has 
different contents than
+   * the requested file.
+   */
+  def fetchFile(
+  url: String,
+  targetDir: File,
+  conf: SparkConf,
+  securityMgr: SecurityManager,
+  hadoopConf: Configuration,
+  timestamp: Long,
+  useCache: Boolean) {
+val fileName = url.split("/").last
+val targetFile = new File(targetDir, fileName)
+if (useCache) {
+  val cachedFileName = url.hashCode + timestamp + "_cach"
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3414][SQL] Stores analyzed logical plan...

2014-09-08 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2293


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-08 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/2294#discussion_r17278837
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
@@ -197,4 +199,369 @@ private[mllib] object BLAS extends Serializable {
 throw new IllegalArgumentException(s"scal doesn't support vector 
type ${x.getClass}.")
 }
   }
+
+  // For level-3 routines, we use the native BLAS.
+  private def nativeBLAS: NetlibBLAS = {
+if (_nativeBLAS == null) {
+  _nativeBLAS = NativeBLAS
+}
+_nativeBLAS
+  }
+
+  /**
+   * C := alpha * A * B + beta * C
+   * @param transA specify whether to use matrix A, or the transpose of 
matrix A. Should be "N" or
+   *   "n" to use A, and "T" or "t" to use the transpose of A.
+   * @param transB specify whether to use matrix B, or the transpose of 
matrix B. Should be "N" or
+   *   "n" to use B, and "T" or "t" to use the transpose of B.
+   * @param alpha a scalar to scale the multiplication A * B.
+   * @param A the matrix A that will be left multiplied to B. Size of m x 
k.
+   * @param B the matrix B that will be left multiplied by A. Size of k x 
n.
+   * @param beta a scalar that can be used to scale matrix C.
+   * @param C the resulting matrix C. Size of m x n.
+   */
+  def gemm(
+  transA: Boolean,
+  transB: Boolean,
+  alpha: Double,
+  A: Matrix,
+  B: DenseMatrix,
+  beta: Double,
+  C: DenseMatrix): Unit = {
+A match {
+  case sparse: SparseMatrix =>
+gemm(transA, transB, alpha, sparse, B, beta, C)
+  case dense: DenseMatrix =>
+gemm(transA, transB, alpha, dense, B, beta, C)
+  case _ =>
+throw new IllegalArgumentException(s"gemm doesn't support matrix 
type ${A.getClass}.")
+}
+  }
+
+  /**
+   * C := alpha * A * B + beta * C
+   *
+   * @param alpha a scalar to scale the multiplication A * B.
+   * @param A the matrix A that will be left multiplied to B. Size of m x 
k.
+   * @param B the matrix B that will be left multiplied by A. Size of k x 
n.
+   * @param beta a scalar that can be used to scale matrix C.
+   * @param C the resulting matrix C. Size of m x n.
+   */
+  def gemm(
+  alpha: Double,
+  A: Matrix,
+  B: DenseMatrix,
+  beta: Double,
+  C: DenseMatrix) {
+gemm(false, false, alpha, A, B, beta, C)
+  }
+
+  /**
+   * C := alpha * A * B + beta * C
+   * For `DenseMatrix` A.
+   */
+  private def gemm(
+  transA: Boolean,
+  transB: Boolean,
+  alpha: Double,
+  A: DenseMatrix,
+  B: DenseMatrix,
+  beta: Double,
+  C: DenseMatrix) {
+val mA: Int = if (!transA) A.numRows else A.numCols
+val nB: Int = if (!transB) B.numCols else B.numRows
+val kA: Int = if (!transA) A.numCols else A.numRows
+val kB: Int = if (!transB) B.numRows else B.numCols
+val tAstr = if (!transA) "N" else "T"
+val tBstr = if (!transB) "N" else "T"
+
+require(kA == kB, s"The columns of A don't match the rows of B. A: 
$kA, B: $kB")
+require(mA == C.numRows, s"The rows of C don't match the rows of A. C: 
${C.numRows}, A: $mA")
+require(nB == C.numCols,
+  s"The columns of C don't match the columns of B. C: ${C.numCols}, A: 
$nB")
+
+nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, 
B.values, B.numRows,
+  beta, C.values, C.numRows)
+  }
+
+  /**
+   * C := alpha * A * B + beta * C
+   * For `SparseMatrix` A.
+   */
+  private def gemm(
+  transA: Boolean,
+  transB: Boolean,
+  alpha: Double,
+  A: SparseMatrix,
+  B: DenseMatrix,
+  beta: Double,
+  C: DenseMatrix): Unit = {
+val mA: Int = if (!transA) A.numRows else A.numCols
+val nB: Int = if (!transB) B.numCols else B.numRows
+val kA: Int = if (!transA) A.numCols else A.numRows
+val kB: Int = if (!transB) B.numRows else B.numCols
+
+require(kA == kB, s"The columns of A don't match the rows of B. A: 
$kA, B: $kB")
+require(mA == C.numRows, s"The rows of C don't match the rows of A. C: 
${C.numRows}, A: $mA")
+require(nB == C.numCols,
+  s"The columns of C don't match the columns of B. C: ${C.numCols}, A: 
$nB")
+
+val Avals = A.values
+val Arows = if (!transA) A.rowIndices else A.colPtrs
+val Acols = if (!transA) A.colPtrs else A.rowIndices
+
+// Slicing is easy in this case. This is the optimal multiplication 
setting for sparse matrices
+if (transA){
+  var colCounterForB = 0
+  if (!

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-08 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/2294#discussion_r17278816
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
@@ -197,4 +199,369 @@ private[mllib] object BLAS extends Serializable {
 throw new IllegalArgumentException(s"scal doesn't support vector 
type ${x.getClass}.")
 }
   }
+
+  // For level-3 routines, we use the native BLAS.
+  private def nativeBLAS: NetlibBLAS = {
+if (_nativeBLAS == null) {
+  _nativeBLAS = NativeBLAS
+}
+_nativeBLAS
+  }
+
+  /**
+   * C := alpha * A * B + beta * C
+   * @param transA specify whether to use matrix A, or the transpose of 
matrix A. Should be "N" or
+   *   "n" to use A, and "T" or "t" to use the transpose of A.
+   * @param transB specify whether to use matrix B, or the transpose of 
matrix B. Should be "N" or
+   *   "n" to use B, and "T" or "t" to use the transpose of B.
+   * @param alpha a scalar to scale the multiplication A * B.
+   * @param A the matrix A that will be left multiplied to B. Size of m x 
k.
+   * @param B the matrix B that will be left multiplied by A. Size of k x 
n.
+   * @param beta a scalar that can be used to scale matrix C.
+   * @param C the resulting matrix C. Size of m x n.
+   */
+  def gemm(
+  transA: Boolean,
+  transB: Boolean,
+  alpha: Double,
+  A: Matrix,
+  B: DenseMatrix,
+  beta: Double,
+  C: DenseMatrix): Unit = {
+A match {
+  case sparse: SparseMatrix =>
+gemm(transA, transB, alpha, sparse, B, beta, C)
+  case dense: DenseMatrix =>
+gemm(transA, transB, alpha, dense, B, beta, C)
+  case _ =>
+throw new IllegalArgumentException(s"gemm doesn't support matrix 
type ${A.getClass}.")
+}
+  }
+
+  /**
+   * C := alpha * A * B + beta * C
+   *
+   * @param alpha a scalar to scale the multiplication A * B.
+   * @param A the matrix A that will be left multiplied to B. Size of m x 
k.
+   * @param B the matrix B that will be left multiplied by A. Size of k x 
n.
+   * @param beta a scalar that can be used to scale matrix C.
+   * @param C the resulting matrix C. Size of m x n.
+   */
+  def gemm(
+  alpha: Double,
+  A: Matrix,
+  B: DenseMatrix,
+  beta: Double,
+  C: DenseMatrix) {
+gemm(false, false, alpha, A, B, beta, C)
+  }
+
+  /**
+   * C := alpha * A * B + beta * C
+   * For `DenseMatrix` A.
+   */
+  private def gemm(
+  transA: Boolean,
+  transB: Boolean,
+  alpha: Double,
+  A: DenseMatrix,
+  B: DenseMatrix,
+  beta: Double,
+  C: DenseMatrix) {
+val mA: Int = if (!transA) A.numRows else A.numCols
+val nB: Int = if (!transB) B.numCols else B.numRows
+val kA: Int = if (!transA) A.numCols else A.numRows
+val kB: Int = if (!transB) B.numRows else B.numCols
+val tAstr = if (!transA) "N" else "T"
+val tBstr = if (!transB) "N" else "T"
+
+require(kA == kB, s"The columns of A don't match the rows of B. A: 
$kA, B: $kB")
+require(mA == C.numRows, s"The rows of C don't match the rows of A. C: 
${C.numRows}, A: $mA")
+require(nB == C.numCols,
+  s"The columns of C don't match the columns of B. C: ${C.numCols}, A: 
$nB")
+
+nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, 
B.values, B.numRows,
+  beta, C.values, C.numRows)
+  }
+
+  /**
+   * C := alpha * A * B + beta * C
+   * For `SparseMatrix` A.
+   */
+  private def gemm(
+  transA: Boolean,
+  transB: Boolean,
+  alpha: Double,
+  A: SparseMatrix,
+  B: DenseMatrix,
+  beta: Double,
+  C: DenseMatrix): Unit = {
+val mA: Int = if (!transA) A.numRows else A.numCols
+val nB: Int = if (!transB) B.numCols else B.numRows
+val kA: Int = if (!transA) A.numCols else A.numRows
+val kB: Int = if (!transB) B.numRows else B.numCols
+
+require(kA == kB, s"The columns of A don't match the rows of B. A: 
$kA, B: $kB")
+require(mA == C.numRows, s"The rows of C don't match the rows of A. C: 
${C.numRows}, A: $mA")
+require(nB == C.numCols,
+  s"The columns of C don't match the columns of B. C: ${C.numCols}, A: 
$nB")
+
+val Avals = A.values
+val Arows = if (!transA) A.rowIndices else A.colPtrs
+val Acols = if (!transA) A.colPtrs else A.rowIndices
+
+// Slicing is easy in this case. This is the optimal multiplication 
setting for sparse matrices
+if (transA){
+  var colCounterForB = 0
+  if (!

[GitHub] spark pull request: [SPARK-3414][SQL] Stores analyzed logical plan...

2014-09-08 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2293#issuecomment-54914812
  
Thanks!  I've merged this to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-08 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/2294#discussion_r17278811
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala ---
@@ -197,4 +199,369 @@ private[mllib] object BLAS extends Serializable {
 throw new IllegalArgumentException(s"scal doesn't support vector 
type ${x.getClass}.")
 }
   }
+
+  // For level-3 routines, we use the native BLAS.
+  private def nativeBLAS: NetlibBLAS = {
+if (_nativeBLAS == null) {
+  _nativeBLAS = NativeBLAS
+}
+_nativeBLAS
+  }
+
+  /**
+   * C := alpha * A * B + beta * C
+   * @param transA specify whether to use matrix A, or the transpose of 
matrix A. Should be "N" or
+   *   "n" to use A, and "T" or "t" to use the transpose of A.
+   * @param transB specify whether to use matrix B, or the transpose of 
matrix B. Should be "N" or
+   *   "n" to use B, and "T" or "t" to use the transpose of B.
+   * @param alpha a scalar to scale the multiplication A * B.
+   * @param A the matrix A that will be left multiplied to B. Size of m x 
k.
+   * @param B the matrix B that will be left multiplied by A. Size of k x 
n.
+   * @param beta a scalar that can be used to scale matrix C.
+   * @param C the resulting matrix C. Size of m x n.
+   */
+  def gemm(
+  transA: Boolean,
+  transB: Boolean,
+  alpha: Double,
+  A: Matrix,
+  B: DenseMatrix,
+  beta: Double,
+  C: DenseMatrix): Unit = {
+A match {
+  case sparse: SparseMatrix =>
+gemm(transA, transB, alpha, sparse, B, beta, C)
+  case dense: DenseMatrix =>
+gemm(transA, transB, alpha, dense, B, beta, C)
+  case _ =>
+throw new IllegalArgumentException(s"gemm doesn't support matrix 
type ${A.getClass}.")
+}
+  }
+
+  /**
+   * C := alpha * A * B + beta * C
+   *
+   * @param alpha a scalar to scale the multiplication A * B.
+   * @param A the matrix A that will be left multiplied to B. Size of m x 
k.
+   * @param B the matrix B that will be left multiplied by A. Size of k x 
n.
+   * @param beta a scalar that can be used to scale matrix C.
+   * @param C the resulting matrix C. Size of m x n.
+   */
+  def gemm(
+  alpha: Double,
+  A: Matrix,
+  B: DenseMatrix,
+  beta: Double,
+  C: DenseMatrix) {
+gemm(false, false, alpha, A, B, beta, C)
+  }
+
+  /**
+   * C := alpha * A * B + beta * C
+   * For `DenseMatrix` A.
+   */
+  private def gemm(
+  transA: Boolean,
+  transB: Boolean,
+  alpha: Double,
+  A: DenseMatrix,
+  B: DenseMatrix,
+  beta: Double,
+  C: DenseMatrix) {
+val mA: Int = if (!transA) A.numRows else A.numCols
+val nB: Int = if (!transB) B.numCols else B.numRows
+val kA: Int = if (!transA) A.numCols else A.numRows
+val kB: Int = if (!transB) B.numRows else B.numCols
+val tAstr = if (!transA) "N" else "T"
+val tBstr = if (!transB) "N" else "T"
+
+require(kA == kB, s"The columns of A don't match the rows of B. A: 
$kA, B: $kB")
+require(mA == C.numRows, s"The rows of C don't match the rows of A. C: 
${C.numRows}, A: $mA")
+require(nB == C.numCols,
+  s"The columns of C don't match the columns of B. C: ${C.numCols}, A: 
$nB")
+
+nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, 
B.values, B.numRows,
+  beta, C.values, C.numRows)
+  }
+
+  /**
+   * C := alpha * A * B + beta * C
+   * For `SparseMatrix` A.
+   */
+  private def gemm(
+  transA: Boolean,
+  transB: Boolean,
+  alpha: Double,
+  A: SparseMatrix,
+  B: DenseMatrix,
+  beta: Double,
+  C: DenseMatrix): Unit = {
+val mA: Int = if (!transA) A.numRows else A.numCols
+val nB: Int = if (!transB) B.numCols else B.numRows
+val kA: Int = if (!transA) A.numCols else A.numRows
+val kB: Int = if (!transB) B.numRows else B.numCols
+
+require(kA == kB, s"The columns of A don't match the rows of B. A: 
$kA, B: $kB")
+require(mA == C.numRows, s"The rows of C don't match the rows of A. C: 
${C.numRows}, A: $mA")
+require(nB == C.numCols,
+  s"The columns of C don't match the columns of B. C: ${C.numCols}, A: 
$nB")
+
+val Avals = A.values
+val Arows = if (!transA) A.rowIndices else A.colPtrs
+val Acols = if (!transA) A.colPtrs else A.rowIndices
+
+// Slicing is easy in this case. This is the optimal multiplication 
setting for sparse matrices
+if (transA){
+  var colCounterForB = 0
+  if (!

[GitHub] spark pull request: SPARK-3423: [SQL] Implement BETWEEN for SQLPar...

2014-09-08 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2295


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3423: [SQL] Implement BETWEEN for SQLPar...

2014-09-08 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2295#issuecomment-54914672
  
Thanks Will!  I've merged this to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...

2014-09-08 Thread li-zhihui

Github user li-zhihui commented on the pull request:

https://github.com/apache/spark/pull/1616#issuecomment-54914557
  
@andrewor14 
In yarn mode, these cache files will be clean up automatically, and in 
standalone mode, it's not handled.

Now in standalone mode, application work directory 
SPARK_HOME/work/APPLICATION_ID in slave nodes is not clean up too. 
I think if this issue (cleaning up application work directory) was resolved, we 
could use the application work directory as cache directory.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3421][SQL] Allows arbitrary character i...

2014-09-08 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/2291#discussion_r17278691
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataTypeSuite.scala 
---
@@ -55,4 +55,38 @@ class DataTypeSuite extends FunSuite {
   struct(Set("b", "d", "e", "f"))
 }
   }
+
+  test("StructField.toString") {
+def structFieldWithName(name: String) = StructField(name, StringType, 
nullable = true)
+
+assertResult("""StructField("a",StringType,true)""") {
+  structFieldWithName("a").toString
+}
+
+assertResult("""StructField("(a)",StringType,true)""") {
+  structFieldWithName("(a)").toString
+}
+
+assertResult("""StructField("a\\b\"",StringType,true)""") {
+  structFieldWithName("""a\b).toString
+}
+  }
+
+  test("parsing StructField string") {
+val expected = StructType(
+  StructField("a", StringType, true) ::
+  StructField("\"b\"", StringType, true) ::
+  StructField("\"c\\", StringType, true) ::
+  Nil)
+
+val structTypeString = Seq(
+  """StructType(List(""",
+  """StructField("a",StringType,true),""",
+  """StructField("\"b\"",StringType,true),""",
+  """StructField("\"c\\",StringType,true)""",
+  """))"""
+).mkString
+
+assert(catalyst.types.DataType(structTypeString) === expected)
--- End diff --

This is kind of a Nit, but I think I'd prefer tests that just roundtrip 
StructFields that have various weird characters instead of those that are 
dependent on the exact output.  That would test for the desired behavior but 
would not have to be rewritten if we ever change the format. (I mostly say this 
because I just spent the last hour rewriting brittle parquet tests :) )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3443][MLLIB] update default values of t...

2014-09-08 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2322


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3443][MLLIB] update default values of t...

2014-09-08 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/2322#issuecomment-54914389
  
I've merged this into master. Thanks @jkbradley for review!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1981][Streaming] Updated kinesis docs a...

2014-09-08 Thread cfregly

Github user cfregly closed the pull request at:

https://github.com/apache/spark/pull/2306


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1360#issuecomment-54914053
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20006/consoleFull)
 for   PR 1360 at commit 
[`f099c0b`](https://github.com/apache/spark/commit/f099c0b2654759ab5cbfe2bc91cedac10f3bf77f).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3193]output errer info when Process exi...

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2108#issuecomment-54913894
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20007/consoleFull)
 for   PR 2108 at commit 
[`563fde1`](https://github.com/apache/spark/commit/563fde16386ce3188331f1ae5b52424e1c4447ff).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3443][MLLIB] update default values of t...

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2322#issuecomment-54913931
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20003/consoleFull)
 for   PR 2322 at commit 
[`cda453a`](https://github.com/apache/spark/commit/cda453a237fc8a93b5764e09ef689af1fadf8063).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2257#issuecomment-54913778
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20004/consoleFull)
 for   PR 2257 at commit 
[`68fbbbf`](https://github.com/apache/spark/commit/68fbbbfc03d9d18eec58c1a6dad058014157e9da).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3448][SQL] Check for null in SpecificMu...

2014-09-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2325#issuecomment-54913697
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20008/consoleFull)
 for   PR 2325 at commit 
[`9366c44`](https://github.com/apache/spark/commit/9366c44ad6c9f65d074b93fce96ec6c5b6b17ad6).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Add ValueIncrementableHashMapAccumulator

2014-09-08 Thread mateiz

Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/2314#issuecomment-54913668
  
Thanks for the PR but this doesn't seem like something that needs to be in 
Spark -- we already have a histogram() implementation, and users can always 
build their own accumulator. I think that by the time people figure out this 
exists, they could've built their own class in this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...

2014-09-08 Thread li-zhihui

Github user li-zhihui commented on a diff in the pull request:

https://github.com/apache/spark/pull/1616#discussion_r17278282
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -313,14 +313,74 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file requested by the executor . Supports fetching the 
file in a variety of ways,
+   * including HTTP, HDFS and files on a standard filesystem, based on the 
URL parameter.
+   *
+   * If `useCache` is true, first attempts to fetch the file from a local 
cache that's shared across
+   * executors running the same application.
+   *
+   * Throws SparkException if the target file already exists and has 
different contents than
+   * the requested file.
+   */
+  def fetchFile(
+  url: String,
+  targetDir: File,
+  conf: SparkConf,
+  securityMgr: SecurityManager,
+  hadoopConf: Configuration,
+  timestamp: Long,
+  useCache: Boolean) {
+val fileName = url.split("/").last
+val targetFile = new File(targetDir, fileName)
+if (useCache) {
+  val cachedFileName = url.hashCode + timestamp + "_cach"
+  val lockFileName = url.hashCode + timestamp + "_lock"
+  val localDir = new File(getLocalDir(conf))
+  val lockFile = new File(localDir, lockFileName)
+  val raf = new RandomAccessFile(lockFile, "rw")
+  // Only one executor entry.
+  // The FileLock is only used to control synchronization for 
executors download file,
+  // it's always safe regardless of lock type(mandatory or advisory).
+  val lock = raf.getChannel().lock()
+  val cachedFile = new File(localDir, cachedFileName)
+  try {
+if (!cachedFile.exists()) {
+  doFetchFile(url, localDir, conf, securityMgr, hadoopConf)
+  Files.move(new File(localDir, fileName), cachedFile)
+}
+  } finally {
+lock.release()
+  }
+  Files.copy(cachedFile, targetFile)
--- End diff --

I think it's OK, but now executor use these files as they are in their work 
directory ./. Maybe we can optimize to avoid the copy in next 
patch if we prove this patch work well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 376 matches

Mail list logo