date:20160703

[GitHub] spark issue #13990: [SPARK-16287][SQL] Implement str_to_map SQL function

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13990
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13990: [SPARK-16287][SQL] Implement str_to_map SQL function

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13990
  
**[Test build #61685 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61685/consoleFull)**
 for PR 13990 at commit 
[`fa294bc`](https://github.com/apache/spark/commit/fa294bc04896ed61efb847adc2612feca9cbaf16).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...

2016-07-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14035#discussion_r69390423
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/OneHotEncoderSuite.scala ---
@@ -29,10 +29,11 @@ import org.apache.spark.sql.types._
 
 class OneHotEncoderSuite
   extends SparkFunSuite with MLlibTestSparkContext with 
DefaultReadWriteTest {
+  import testImplicits._
 
   def stringIndexed(): DataFrame = {
 val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), 
(4, "a"), (5, "c")), 2)
--- End diff --

Remove `sc.parallelize`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...

2016-07-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14035#discussion_r69390388
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/NormalizerSuite.scala ---
@@ -61,7 +62,7 @@ class NormalizerSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defa
   Vectors.sparse(3, Seq())
 )
 
-dataFrame = spark.createDataFrame(sc.parallelize(data, 
2).map(NormalizerSuite.FeatureData))
+dataFrame = sc.parallelize(data, 
2).map(NormalizerSuite.FeatureData).toDF()
--- End diff --

Remove `sc.parallelize`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...

2016-07-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14035#discussion_r69390273
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/MinMaxScalerSuite.scala ---
@@ -57,8 +58,7 @@ class MinMaxScalerSuite extends SparkFunSuite with 
MLlibTestSparkContext with De
 
   test("MinMaxScaler arguments max must be larger than min") {
 withClue("arguments max must be larger than min") {
-  val dummyDF = spark.createDataFrame(Seq(
-(1, Vectors.dense(1.0, 2.0.toDF("id", "feature")
+  val dummyDF = Seq((1, Vectors.dense(1.0, 2.0))).toDF("id", "feature")
--- End diff --

It's just a column name, but for consistency...`features` (not `feature`) 
(unless there's a reason for this)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...

2016-07-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14035#discussion_r69390216
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala ---
@@ -44,7 +45,7 @@ class CountVectorizerSuite extends SparkFunSuite with 
MLlibTestSparkContext
   (3, split(""), Vectors.sparse(4, Seq())), // empty string
--- End diff --

Replace the comment `// empty string` with `val EMPTY_STRING = ""`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...

2016-07-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14035#discussion_r69390204
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/evaluation/RegressionEvaluatorSuite.scala
 ---
@@ -42,9 +43,10 @@ class RegressionEvaluatorSuite
  * data.map(x=> x.label + ", " + x.features(0) + ", " + x.features(1))
  *   .saveAsTextFile("path")
  */
-val dataset = spark.createDataFrame(
-  sc.parallelize(LinearDataGenerator.generateLinearInput(
-6.3, Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 100, 42, 
0.1), 2).map(_.asML))
+val dataset = sc.parallelize(
--- End diff --

Remove `sc.parallelize`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...

2016-07-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14035#discussion_r69390189
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala
 ---
@@ -158,7 +159,7 @@ class RandomForestClassifierSuite
   }
 
   test("Fitting without numClasses in metadata") {
-val df: DataFrame = 
spark.createDataFrame(TreeTests.featureImportanceData(sc))
+val df: DataFrame = TreeTests.featureImportanceData(sc).toDF()
--- End diff --

Why is the type annotation needed here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...

2016-07-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14035#discussion_r69390185
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala ---
@@ -55,7 +56,7 @@ class OneVsRestSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defau
 val xVariance = Array(0.6856, 0.1899, 3.116, 0.581)
 rdd = sc.parallelize(generateMultinomialLogisticInput(
   coefficients, xMean, xVariance, true, nPoints, 42), 2)
-dataset = spark.createDataFrame(rdd)
+dataset = rdd.toDF()
--- End diff --

Merge it with line 57.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...

2016-07-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14035#discussion_r69390178
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/NaiveBayesSuite.scala 
---
@@ -47,7 +48,7 @@ class NaiveBayesSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defa
   Array(0.10, 0.10, 0.70, 0.10)  // label 2
 ).map(_.map(math.log))
 
-dataset = spark.createDataFrame(generateNaiveBayesInput(pi, theta, 
100, 42))
+dataset = generateNaiveBayesInput(pi, theta, 100, 42).toDF()
--- End diff --

Exactly my point above :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...

2016-07-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14035#discussion_r69390173
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala
 ---
@@ -116,7 +117,7 @@ class MultilayerPerceptronClassifierSuite
 // the input seed is somewhat magic, to make this test pass
 val rdd = sc.parallelize(generateMultinomialLogisticInput(
   coefficients, xMean, xVariance, true, nPoints, 1), 2)
-val dataFrame = spark.createDataFrame(rdd).toDF("label", "features")
+val dataFrame = rdd.toDF("label", "features")
--- End diff --

Could we merge this line with 118? I don't think 118 needs `sc.parallelize`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...

2016-07-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14035#discussion_r69390144
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
 ---
@@ -55,7 +56,7 @@ class LogisticRegressionSuite
 generateMultinomialLogisticInput(coefficients, xMean, xVariance,
   addIntercept = true, nPoints, 42)
 
-  spark.createDataFrame(sc.parallelize(testData, 4))
+  sc.parallelize(testData, 4).toDF()
--- End diff --

`testData.toDF.repartition(4)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...

2016-07-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14035#discussion_r69390147
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
 ---
@@ -869,8 +870,7 @@ class LogisticRegressionSuite
 }
   }
 
-  (spark.createDataFrame(sc.parallelize(data1, 4)),
-spark.createDataFrame(sc.parallelize(data2, 4)))
+  (sc.parallelize(data1, 4).toDF(), sc.parallelize(data2, 4).toDF())
--- End diff --

Same as above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...

2016-07-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14035#discussion_r69390132
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala
 ---
@@ -134,15 +135,14 @@ class GBTClassifierSuite extends SparkFunSuite with 
MLlibTestSparkContext
   */
 
   test("Fitting without numClasses in metadata") {
-val df: DataFrame = 
spark.createDataFrame(TreeTests.featureImportanceData(sc))
+val df: DataFrame = TreeTests.featureImportanceData(sc).toDF()
 val gbt = new GBTClassifier().setMaxDepth(1).setMaxIter(1)
 gbt.fit(df)
--- End diff --

Wonder why this line is separate not part of 139? Any reason?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...

2016-07-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14035#discussion_r69390117
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/ClassifierSuite.scala 
---
@@ -71,8 +71,7 @@ class ClassifierSuite extends SparkFunSuite with 
MLlibTestSparkContext {
 
   test("getNumClasses") {
 def getTestData(labels: Seq[Double]): DataFrame = {
--- End diff --

repeated. What about Moving it outside `test` methods?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14036
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61686/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14036
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14036
  
**[Test build #61686 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61686/consoleFull)**
 for PR 14036 at commit 
[`5db17fd`](https://github.com/apache/spark/commit/5db17fd2e05578bd152d9f2654d49aa4e07abe5b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14031: [SPARK-16353][BUILD][DOC] Missing javadoc options for ja...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14031
  
**[Test build #61687 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61687/consoleFull)**
 for PR 14031 at commit 
[`939e8b5`](https://github.com/apache/spark/commit/939e8b5d3a3b502f3a7870d437cb38ee9564e6c4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14031: [SPARK-16353][BUILD][DOC] Missing javadoc options for ja...

2016-07-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14031
  
Jenkins add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14031: [SPARK-16353][BUILD][DOC] Missing javadoc options for ja...

2016-07-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14031
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14031: [SPARK-16353][BUILD][DOC] Missing javadoc options for ja...

2016-07-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14031
  
Jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14036
  
**[Test build #61686 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61686/consoleFull)**
 for PR 14036 at commit 
[`5db17fd`](https://github.com/apache/spark/commit/5db17fd2e05578bd152d9f2654d49aa4e07abe5b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13990: [SPARK-16287][SQL] Implement str_to_map SQL function

2016-07-03 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/13990
  
cc: @cloud-fan @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13990: [SPARK-16287][SQL][WIP] Implement str_to_map SQL functio...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13990
  
**[Test build #61685 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61685/consoleFull)**
 for PR 13990 at commit 
[`fa294bc`](https://github.com/apache/spark/commit/fa294bc04896ed61efb847adc2612feca9cbaf16).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9207
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9207
  
**[Test build #61684 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61684/consoleFull)**
 for PR 9207 at commit 
[`e6845f1`](https://github.com/apache/spark/commit/e6845f1c37481b270359a6a4291c07dcaace242e).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9207
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61684/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9207
  
**[Test build #61684 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61684/consoleFull)**
 for PR 9207 at commit 
[`e6845f1`](https://github.com/apache/spark/commit/e6845f1c37481b270359a6a4291c07dcaace242e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9207
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61683/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9207
  
**[Test build #61683 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61683/consoleFull)**
 for PR 9207 at commit 
[`49f8a8d`](https://github.com/apache/spark/commit/49f8a8d540860e28de8a356616f78a64ab112909).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9207
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13532: [SPARK-15204][SQL] improve nullability inference ...

2016-07-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13532#discussion_r69388379
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala ---
@@ -305,4 +305,13 @@ class DatasetAggregatorSuite extends QueryTest with 
SharedSQLContext {
 val ds = Seq(1, 2, 3).toDS()
 checkDataset(ds.select(MapTypeBufferAgg.toColumn), 1)
   }
+
+  test("spark-15204 improve nullability inference for Aggregator") {
+val ds1 = Seq(1, 3, 2, 5).toDS()
+assert(ds1.select(typed.sum((i: Int) => i)).schema.head.nullable === 
false)
+val ds2 = Seq(AggData(1, "a"), AggData(2, "a")).toDS()
+assert(ds2.groupByKey(_.b).agg(SeqAgg.toColumn).schema(1).nullable === 
true)
--- End diff --

should we use `ds.select` for all tests? And we should also test String, 
which is flat but nullable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-07-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69388270
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,38 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n  
[1,a]\n  [2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function $prettyName should be array of struct type, 
not ${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) => et
+  }
+
+  private lazy val numFields = elementSchema.fields.length
+
+  override def eval(input: InternalRow): TraversableOnce[InternalRow] = {
+val inputArray = child.eval(input).asInstanceOf[ArrayData]
+if (inputArray == null) {
+  Nil
+} else {
+  for (i <- 0 until inputArray.numElements())
+yield inputArray.getStruct(i, numFields)
--- End diff --

I'm not sure how is the performance of `for-yield`, maybe it's safe to 
create an array manually and use while loop here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9207
  
**[Test build #61683 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61683/consoleFull)**
 for PR 9207 at commit 
[`49f8a8d`](https://github.com/apache/spark/commit/49f8a8d540860e28de8a356616f78a64ab112909).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14035
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14035
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61681/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14036
  
**[Test build #61682 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61682/consoleFull)**
 for PR 14036 at commit 
[`e4e42c3`](https://github.com/apache/spark/commit/e4e42c35b0236dff6aedf6468a7d94f80bc6023b).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class MapKeys(child: Expression)`
  * `case class MapValues(child: Expression)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14036
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61682/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14036
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14035
  
**[Test build #61681 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61681/consoleFull)**
 for PR 14035 at commit 
[`6fdf290`](https://github.com/apache/spark/commit/6fdf290d682b190f6b0457bb084cebacd105ee00).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14036
  
**[Test build #61682 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61682/consoleFull)**
 for PR 14036 at commit 
[`e4e42c3`](https://github.com/apache/spark/commit/e4e42c35b0236dff6aedf6468a7d94f80bc6023b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69388219
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala
 ---
@@ -725,4 +725,51 @@ class StringExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper {
 checkEvaluation(FindInSet(Literal("abf"), Literal("abc,b,ab,c,def")), 
0)
 checkEvaluation(FindInSet(Literal("ab,"), Literal("abc,b,ab,c,def")), 
0)
   }
+
+  test("ParseUrl") {
+def checkParseUrl(expected: String, urlStr: String, partToExtract: 
String): Unit = {
+  checkEvaluation(
+ParseUrl(Seq(Literal.create(urlStr, StringType),
+  Literal.create(partToExtract, StringType))), expected)
+}
+def checkParseUrlWithKey(
+expected: String, urlStr: String,
+partToExtract: String, key: String): Unit = {
+  checkEvaluation(
+ParseUrl(Seq(Literal.create(urlStr, StringType), 
Literal.create(partToExtract, StringType),
+  Literal.create(key, StringType))), expected)
+}
+
+checkParseUrl("spark.apache.org", 
"http://spark.apache.org/path?query=1;, "HOST")
+checkParseUrl("/path", "http://spark.apache.org/path?query=1;, "PATH")
+checkParseUrl("query=1", "http://spark.apache.org/path?query=1;, 
"QUERY")
+checkParseUrl("Ref", "http://spark.apache.org/path?query=1#Ref;, "REF")
+checkParseUrl("http", "http://spark.apache.org/path?query=1;, 
"PROTOCOL")
+checkParseUrl("/path?query=1", "http://spark.apache.org/path?query=1;, 
"FILE")
+checkParseUrl("spark.apache.org:8080", 
"http://spark.apache.org:8080/path?query=1;, "AUTHORITY")
+checkParseUrl("userinfo", 
"http://useri...@spark.apache.org/path?query=1;, "USERINFO")
+checkParseUrlWithKey("1", "http://spark.apache.org/path?query=1;, 
"QUERY", "query")
+
+// Null checking
+checkParseUrl(null, null, "HOST")
+checkParseUrl(null, "http://spark.apache.org/path?query=1;, null)
+checkParseUrl(null, null, null)
+checkParseUrl(null, "test", "HOST")
+checkParseUrl(null, "http://spark.apache.org/path?query=1;, "NO")
+checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, 
"HOST", "query")
+checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, 
"QUERY", "quer")
+checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, 
"QUERY", null)
+checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, 
"QUERY", "")
+
+// exceptional cases
+intercept[java.util.regex.PatternSyntaxException] {
--- End diff --

I think it's fine to throw the exception at executor side, no need to 
specially handle literal here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14004: [SPARK-16285][SQL] Implement sentences SQL functi...

2016-07-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14004#discussion_r69388199
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -801,6 +804,49 @@ public static UTF8String concatWs(UTF8String 
separator, UTF8String... inputs) {
 return res;
   }
 
+  /**
+   * Return a locale of the given language and country, or a default 
locale when failures occur.
+   */
+  private Locale getLocale(UTF8String language, UTF8String country) {
+try {
+  return new Locale(language.toString(), country.toString());
+} finally {
+  return Locale.getDefault();
+}
+  }
+
+  /**
+   * Splits a string into arrays of sentences, where each sentence is an 
array of words.
+   */
+  public ArrayList sentences(UTF8String language, 
UTF8String country) {
--- End diff --

hmmm, I'm confused too. @rxin what's the convention here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-03 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/14036
  
cc: @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...

2016-07-03 Thread techaddict

GitHub user techaddict opened a pull request:

https://github.com/apache/spark/pull/14036

[SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessary cast

## What changes were proposed in this pull request?
Add IntegerDivide to avoid unnecessary cast

Before:
```
scala> spark.sql("select 6 div 3").explain(true)
...
== Analyzed Logical Plan ==
CAST((6 / 3) AS BIGINT): bigint
Project [cast((cast(6 as double) / cast(3 as double)) as bigint) AS
CAST((6 / 3) AS BIGINT)#5L]
+- OneRowRelation$
...
```

After:
```
scala> spark.sql("select 6 div 3").explain(true)
...
== Analyzed Logical Plan ==
(6 / 3): int
Project [(6 / 3) AS (6 / 3)#11]
+- OneRowRelation$
...
```

## How was this patch tested?
Existing Tests and added new ones

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/techaddict/spark SPARK-16323

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14036.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14036


commit 067b788b05846e659615feef8613f8965573f150
Author: Sandeep Singh 
Date:   2016-07-03T09:00:11Z

[SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessary cast

Before:
```
scala> spark.sql("select 6 div 3").explain(true)
...
== Analyzed Logical Plan ==
CAST((6 / 3) AS BIGINT): bigint
Project [cast((cast(6 as double) / cast(3 as double)) as bigint) AS
CAST((6 / 3) AS BIGINT)#5L]
+- OneRowRelation$
...
```

After:
```
scala> spark.sql("select 6 div 3").explain(true)
...
== Analyzed Logical Plan ==
(6 / 3): int
Project [(6 / 3) AS (6 / 3)#11]
+- OneRowRelation$
...
```

commit e4e42c35b0236dff6aedf6468a7d94f80bc6023b
Author: Sandeep Singh 
Date:   2016-07-03T09:00:59Z

Merge branch 'master' into SPARK-16323




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13967: [SPARK-16278][SPARK-16279][SQL] Implement map_key...

2016-07-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13967


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13967: [SPARK-16278][SPARK-16279][SQL] Implement map_keys/map_v...

2016-07-03 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13967
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14007: [SPARK-16329] [SQL] Star Expansion over Table Containing...

2016-07-03 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14007
  
and also 1.6 please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14007: [SPARK-16329] [SQL] Star Expansion over Table Containing...

2016-07-03 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14007
  
thanks, merging to master!

@gatorsmile can you backport it to 2.0 and fix the conflicts? thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14007: [SPARK-16329] [SQL] Star Expansion over Table Con...

2016-07-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14007


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14004: [SPARK-16285][SQL] Implement sentences SQL functions

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14004
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61680/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14004: [SPARK-16285][SQL] Implement sentences SQL functions

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14004
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14004: [SPARK-16285][SQL] Implement sentences SQL functions

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14004
  
**[Test build #61680 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61680/consoleFull)**
 for PR 14004 at commit 
[`ea75373`](https://github.com/apache/spark/commit/ea753738b153d15cc5c75eea88a8b86ad79d1d7b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14008: [SPARK-16281][SQL] Implement parse_url SQL function

2016-07-03 Thread janplus

Github user janplus commented on the issue:

https://github.com/apache/spark/pull/14008
  
@dongjoon-hyun Thank you for review
I have add a new commit which does following things:

1. Implement **Literal** `key` validation in `Driver` side.
2. Correct some failure message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

2016-07-03 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/13704
  
@cloud-fan , regarding a ```cast```, I think that you are correct. When we 
insert a ```cast```, I pass information on ```ArrayType.containsNull``` to 
```Cast``` now. In this case, it is ```false```. As a result, after the logical 
optimizations, we got a plan tree with ```Cast```. I updated the description of 
PR.

```
== Analyzed Logical Plan ==
value: double
SerializeFromObject [input[0, double, true] AS value#6]
+- MapElements , obj#5: double
   +- DeserializeToObject cast(value#1 as array).toDoubleArray, 
obj#4: [D
  +- LocalRelation [value#1]

== Optimized Logical Plan ==
SerializeFromObject [input[0, double, true] AS value#6]
+- MapElements , obj#5: double
   +- DeserializeToObject value#1.toDoubleArray, obj#4: [D
  +- LocalRelation [value#1]
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14034: [16355] [16354] [SQL] Fix Bugs When LIMIT/TABLESAMPLE is...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14034
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61678/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14034: [16355] [16354] [SQL] Fix Bugs When LIMIT/TABLESAMPLE is...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14034
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14034: [16355] [16354] [SQL] Fix Bugs When LIMIT/TABLESAMPLE is...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14034
  
**[Test build #61678 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61678/consoleFull)**
 for PR 14034 at commit 
[`bdf4e56`](https://github.com/apache/spark/commit/bdf4e56f3478bd99d1e92d338a984dba869363dc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14035
  
**[Test build #61681 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61681/consoleFull)**
 for PR 14035 at commit 
[`6fdf290`](https://github.com/apache/spark/commit/6fdf290d682b190f6b0457bb084cebacd105ee00).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-03 Thread janplus

Github user janplus commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69386958
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
--- End diff --

Thanks, this is my approach. It will give following message
> org.apache.spark.sql.AnalysisException: cannot resolve 
'parse_url("http://spark.apache.org/path?;, "QUERY", "???")' due to data type 
mismatch: invalid key '???'; line 1 pos 7
Does it look ok to you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-03 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69386898
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
--- End diff --

Then, you can try that in `checkInputDataTypes`. Please refer the following 
code.


https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L544


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13704
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61677/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13704
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13704
  
**[Test build #61677 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61677/consoleFull)**
 for PR 13704 at commit 
[`77859cf`](https://github.com/apache/spark/commit/77859cf4397b8a5022b93ffa4996203b36dfef1b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-03 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69386854
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
--- End diff --

Oh, sorry for that. I missed that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13976: [SPARK-16288][SQL] Implement inline table generating fun...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13976
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13976: [SPARK-16288][SQL] Implement inline table generating fun...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13976
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61676/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13976: [SPARK-16288][SQL] Implement inline table generating fun...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13976
  
**[Test build #61676 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61676/consoleFull)**
 for PR 13976 at commit 
[`e260359`](https://github.com/apache/spark/commit/e26035968c73210dda38e82654fc335390fc6c1e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14032: [Minor][SQL] Replace Parquet deprecations

2016-07-03 Thread techaddict

Github user techaddict closed the pull request at:

https://github.com/apache/spark/pull/14032


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14033
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61675/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14033
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14033
  
**[Test build #61675 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61675/consoleFull)**
 for PR 14033 at commit 
[`6de93a1`](https://github.com/apache/spark/commit/6de93a1582ac5877a932ea47e86811e228b5c2f6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Stack(children: Seq[Expression])`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-03 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69386603
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UnsafeArrayDataBenchmark.scala
 ---
@@ -0,0 +1,298 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.catalyst.expressions.{UnsafeArrayData, 
UnsafeRow}
+import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, 
UnsafeArrayWriter}
+import org.apache.spark.unsafe.Platform
+import org.apache.spark.util.Benchmark
+
+/**
+ * Benchmark [[UnsafeArrayDataBenchmark]] for UnsafeArrayData
+ * To run this:
+ *  build/sbt "sql/test-only *benchmark.UnsafeArrayDataBenchmark"
+ *
+ * Benchmarks in this file are skipped in normal builds.
+ */
+class UnsafeArrayDataBenchmark extends BenchmarkBase {
+
+  new SparkConf()
+.setMaster("local[1]")
+.setAppName("microbenchmark")
+.set("spark.driver.memory", "3g")
+
+  def calculateHeaderPortionInBytes(count: Int) : Int = {
+// Use this assignment for SPARK-15962
+// val size = 4 + 4 * count
+val size = UnsafeArrayData.calculateHeaderPortionInBytes(count)
+size
+  }
+
+  def readUnsafeArray(iters: Int): Unit = {
+val count = 1024 * 1024 * 16
+
+val intUnsafeArray = new UnsafeArrayData
+var intResult: Int = 0
+val intSize = calculateHeaderPortionInBytes(count) + 4 * count
+val intBuffer = new Array[Byte](intSize)
+Platform.putInt(intBuffer, Platform.BYTE_ARRAY_OFFSET, count)
+intUnsafeArray.pointTo(intBuffer, Platform.BYTE_ARRAY_OFFSET, intSize)
+val readIntArray = { i: Int =>
+  var n = 0
+  while (n < iters) {
+val len = intUnsafeArray.numElements
+var sum = 0.toInt
+var i = 0
+while (i < len) {
+  sum += intUnsafeArray.getInt(i)
+  i += 1
+}
+intResult = sum
+n += 1
+  }
+}
+
+val doubleUnsafeArray = new UnsafeArrayData
+var doubleResult: Double = 0
+val doubleSize = calculateHeaderPortionInBytes(count) + 8 * count
+val doubleBuffer = new Array[Byte](doubleSize)
+Platform.putInt(doubleBuffer, Platform.BYTE_ARRAY_OFFSET, count)
+doubleUnsafeArray.pointTo(doubleBuffer, Platform.BYTE_ARRAY_OFFSET, 
doubleSize)
+val readDoubleArray = { i: Int =>
+  var n = 0
+  while (n < iters) {
+val len = doubleUnsafeArray.numElements
+var sum = 0.toDouble
+var i = 0
+while (i < len) {
+  sum += doubleUnsafeArray.getDouble(i)
+  i += 1
+}
+doubleResult = sum
+n += 1
+  }
+}
+
+val benchmark = new Benchmark("Read UnsafeArrayData", count * iters)
+benchmark.addCase("Int")(readIntArray)
+benchmark.addCase("Double")(readDoubleArray)
+benchmark.run
+/*
+Without SPARK-15962
+OpenJDK 64-Bit Server VM 1.8.0_91-b14 on Linux 4.0.4-301.fc22.x86_64
+Intel Xeon E3-12xx v2 (Ivy Bridge)
+Read UnsafeArrayData:Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative
+

+Int370 /  471454.0 
  2.2   1.0X
+Double 351 /  466477.5 
  2.1   1.1X
+*/
+/*
+With SPARK-15962
--- End diff --

Sure, removed the result without this PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-03 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69386599
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
 ---
@@ -25,30 +25,36 @@
 import org.apache.spark.sql.types.*;
 import org.apache.spark.unsafe.Platform;
 import org.apache.spark.unsafe.array.ByteArrayMethods;
+import org.apache.spark.unsafe.bitset.BitSetMethods;
 import org.apache.spark.unsafe.hash.Murmur3_x86_32;
 import org.apache.spark.unsafe.types.CalendarInterval;
 import org.apache.spark.unsafe.types.UTF8String;
 
 /**
  * An Unsafe implementation of Array which is backed by raw memory instead 
of Java objects.
  *
- * Each tuple has three parts: [numElements] [offsets] [values]
+ * Each tuple has four parts: [numElements][null bits][values or 
offset][variable length portion]
  *
  * The `numElements` is 4 bytes storing the number of elements of this 
array.
  *
- * In the `offsets` region, we store 4 bytes per element, represents the 
relative offset (w.r.t. the
- * base address of the array) of this element in `values` region. We can 
get the length of this
- * element by subtracting next offset.
- * Note that offset can by negative which means this element is null.
+ * In the `null bits` region, we store 1 bit per element, represents 
whether a element has null
+ * Its total size is ceil(numElements / 8) bytes, and  it is aligned to 
8-byte word boundaries.
  *
- * In the `values` region, we store the content of elements. As we can get 
length info, so elements
- * can be variable-length.
+ * In the `values or offset` region, we store the content of elements. For 
fields that hold
+ * fixed-length primitive types, such as long, double, or int, we store 
the value directly
+ * in the field. For fields with non-primitive or variable-length values, 
we store a relative
+ * offset (w.r.t. the base address of the row) that points to the 
beginning of the variable-length
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14035
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61679/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14035
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-03 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69386595
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
 ---
@@ -317,7 +301,6 @@ public boolean equals(Object other) {
 }
 return false;
   }
-
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-03 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69386593
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
 ---
@@ -341,63 +324,113 @@ public UnsafeArrayData copy() {
 return arrayCopy;
   }
 
-  public static UnsafeArrayData fromPrimitiveArray(int[] arr) {
-if (arr.length > (Integer.MAX_VALUE - 4) / 8) {
-  throw new UnsupportedOperationException("Cannot convert this array 
to unsafe format as " +
-"it's too big.");
-}
+  @Override
+  public boolean[] toBooleanArray() {
+int size = numElements();
+boolean[] values = new boolean[size];
+Platform.copyMemory(
+  baseObject, baseOffset + headerInBytes, values, 
Platform.BYTE_ARRAY_OFFSET, size);
+return values;
+  }
+
+  @Override
+  public byte[] toByteArray() {
+int size = numElements();
+byte[] values = new byte[size];
+Platform.copyMemory(
+  baseObject, baseOffset + headerInBytes, values, 
Platform.BYTE_ARRAY_OFFSET, size);
+return values;
+  }
+
+  @Override
+  public short[] toShortArray() {
+int size = numElements();
+short[] values = new short[size];
+Platform.copyMemory(
+  baseObject, baseOffset + headerInBytes, values, 
Platform.SHORT_ARRAY_OFFSET, size * 2);
+return values;
+  }
 
-final int offsetRegionSize = 4 * arr.length;
-final int valueRegionSize = 4 * arr.length;
-final int totalSize = 4 + offsetRegionSize + valueRegionSize;
-final byte[] data = new byte[totalSize];
+  @Override
+  public int[] toIntArray() {
+int size = numElements();
+int[] values = new int[size];
+Platform.copyMemory(
+  baseObject, baseOffset + headerInBytes, values, 
Platform.INT_ARRAY_OFFSET, size * 4);
+return values;
+  }
 
-Platform.putInt(data, Platform.BYTE_ARRAY_OFFSET, arr.length);
+  @Override
+  public long[] toLongArray() {
+int size = numElements();
+long[] values = new long[size];
+Platform.copyMemory(
+  baseObject, baseOffset + headerInBytes, values, 
Platform.LONG_ARRAY_OFFSET, size * 8);
+return values;
+  }
 
-int offsetPosition = Platform.BYTE_ARRAY_OFFSET + 4;
-int valueOffset = 4 + offsetRegionSize;
-for (int i = 0; i < arr.length; i++) {
-  Platform.putInt(data, offsetPosition, valueOffset);
-  offsetPosition += 4;
-  valueOffset += 4;
+  @Override
+  public float[] toFloatArray() {
+int size = numElements();
+float[] values = new float[size];
+Platform.copyMemory(
+  baseObject, baseOffset + headerInBytes, values, 
Platform.FLOAT_ARRAY_OFFSET, size * 4);
+return values;
+  }
+
+  @Override
+  public double[] toDoubleArray() {
+int size = numElements();
+double[] values = new double[size];
+Platform.copyMemory(
+  baseObject, baseOffset + headerInBytes, values, 
Platform.DOUBLE_ARRAY_OFFSET, size * 8);
+return values;
+  }
+
+  private static UnsafeArrayData fromPrimitiveArray(Object arr, int 
length, final int elementSize) {
+final int headerSize = calculateHeaderPortionInBytes(length);
+if (length > (Integer.MAX_VALUE - headerSize) / elementSize) {
+  throw new UnsupportedOperationException("Cannot convert this array 
to unsafe format as " +
+"it's too big.");
 }
 
+final int valueRegionSize = elementSize * length;
+final byte[] data = new byte[valueRegionSize + headerSize];
+
+Platform.putInt(data, Platform.BYTE_ARRAY_OFFSET, length);
 Platform.copyMemory(arr, Platform.INT_ARRAY_OFFSET, data,
--- End diff --

Definitely yes, done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14035
  
**[Test build #61679 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61679/consoleFull)**
 for PR 14035 at commit 
[`54c27d4`](https://github.com/apache/spark/commit/54c27d4d359a7e6ad445856e06f15e29132d582c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-03 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69386582
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java
 ---
@@ -33,91 +38,147 @@
   // The offset of the global buffer where we start to write this array.
   private int startingOffset;
 
+  // The number of elements in this array
+  private int numElements;
+
+  private int headerInBytes;
+
+  private void assertIndexIsValid(int index) {
+assert index >= 0 : "index (" + index + ") should >= 0";
+assert index < numElements : "index (" + index + ") should < " + 
numElements;
+  }
+
   public void initialize(BufferHolder holder, int numElements, int 
fixedElementSize) {
-// We need 4 bytes to store numElements and 4 bytes each element to 
store offset.
-final int fixedSize = 4 + 4 * numElements;
+this.numElements = numElements;
+this.headerInBytes = calculateHeaderPortionInBytes(numElements);
 
 this.holder = holder;
 this.startingOffset = holder.cursor;
 
-holder.grow(fixedSize);
-Platform.putInt(holder.buffer, holder.cursor, numElements);
-holder.cursor += fixedSize;
+// Grows the global buffer ahead for header and fixed size data.
+holder.grow(headerInBytes + fixedElementSize * numElements);
+
+// Initialize information in header
+Platform.putInt(holder.buffer, startingOffset, numElements);
+Arrays.fill(holder.buffer, startingOffset + 4 - 
Platform.BYTE_ARRAY_OFFSET,
+  startingOffset + headerInBytes - Platform.BYTE_ARRAY_OFFSET, 
(byte)0);
 
-// Grows the global buffer ahead for fixed size data.
-holder.grow(fixedElementSize * numElements);
+holder.cursor += (headerInBytes + fixedElementSize * numElements);
   }
 
-  private long getElementOffset(int ordinal) {
-return startingOffset + 4 + 4 * ordinal;
+  private long getElementOffset(int ordinal, int scale) {
+return startingOffset + headerInBytes + ordinal * scale;
+  }
+
+  public void setOffsetAndSize(int ordinal, long currentCursor, long size) 
{
+final long relativeOffset = currentCursor - startingOffset;
+final long offsetAndSize = (relativeOffset << 32) | size;
+
+write(ordinal, offsetAndSize);
   }
 
   public void setNullAt(int ordinal) {
-final int relativeOffset = holder.cursor - startingOffset;
-// Writes negative offset value to represent null element.
-Platform.putInt(holder.buffer, getElementOffset(ordinal), 
-relativeOffset);
+throw new UnsupportedOperationException("setNullAt() is not 
supported");
--- End diff --

Removed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-03 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69386586
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java
 ---
@@ -33,91 +38,147 @@
   // The offset of the global buffer where we start to write this array.
   private int startingOffset;
 
+  // The number of elements in this array
+  private int numElements;
+
+  private int headerInBytes;
+
+  private void assertIndexIsValid(int index) {
+assert index >= 0 : "index (" + index + ") should >= 0";
+assert index < numElements : "index (" + index + ") should < " + 
numElements;
+  }
+
   public void initialize(BufferHolder holder, int numElements, int 
fixedElementSize) {
-// We need 4 bytes to store numElements and 4 bytes each element to 
store offset.
-final int fixedSize = 4 + 4 * numElements;
+this.numElements = numElements;
+this.headerInBytes = calculateHeaderPortionInBytes(numElements);
 
 this.holder = holder;
 this.startingOffset = holder.cursor;
 
-holder.grow(fixedSize);
-Platform.putInt(holder.buffer, holder.cursor, numElements);
-holder.cursor += fixedSize;
+// Grows the global buffer ahead for header and fixed size data.
+holder.grow(headerInBytes + fixedElementSize * numElements);
+
+// Initialize information in header
+Platform.putInt(holder.buffer, startingOffset, numElements);
+Arrays.fill(holder.buffer, startingOffset + 4 - 
Platform.BYTE_ARRAY_OFFSET,
+  startingOffset + headerInBytes - Platform.BYTE_ARRAY_OFFSET, 
(byte)0);
 
-// Grows the global buffer ahead for fixed size data.
-holder.grow(fixedElementSize * numElements);
+holder.cursor += (headerInBytes + fixedElementSize * numElements);
   }
 
-  private long getElementOffset(int ordinal) {
-return startingOffset + 4 + 4 * ordinal;
+  private long getElementOffset(int ordinal, int scale) {
--- End diff --

Sure, done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-03 Thread janplus

Github user janplus commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69386564
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
--- End diff --

I get your point. Since implement here will still be `Executor` side, I'll 
try to find another way to do this.
Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-03 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69386574
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java
 ---
@@ -33,91 +38,147 @@
   // The offset of the global buffer where we start to write this array.
   private int startingOffset;
 
+  // The number of elements in this array
+  private int numElements;
+
+  private int headerInBytes;
+
+  private void assertIndexIsValid(int index) {
+assert index >= 0 : "index (" + index + ") should >= 0";
+assert index < numElements : "index (" + index + ") should < " + 
numElements;
+  }
+
   public void initialize(BufferHolder holder, int numElements, int 
fixedElementSize) {
-// We need 4 bytes to store numElements and 4 bytes each element to 
store offset.
-final int fixedSize = 4 + 4 * numElements;
+this.numElements = numElements;
+this.headerInBytes = calculateHeaderPortionInBytes(numElements);
 
 this.holder = holder;
 this.startingOffset = holder.cursor;
 
-holder.grow(fixedSize);
-Platform.putInt(holder.buffer, holder.cursor, numElements);
-holder.cursor += fixedSize;
+// Grows the global buffer ahead for header and fixed size data.
+holder.grow(headerInBytes + fixedElementSize * numElements);
+
+// Initialize information in header
+Platform.putInt(holder.buffer, startingOffset, numElements);
+Arrays.fill(holder.buffer, startingOffset + 4 - 
Platform.BYTE_ARRAY_OFFSET,
--- End diff --

Replaced this with ```Platform.putlong``` as ```zeroOutNullBytes``` does.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-03 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69386561
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java
 ---
@@ -19,9 +19,14 @@
 
 import org.apache.spark.sql.types.Decimal;
 import org.apache.spark.unsafe.Platform;
+import org.apache.spark.unsafe.bitset.BitSetMethods;
 import org.apache.spark.unsafe.types.CalendarInterval;
 import org.apache.spark.unsafe.types.UTF8String;
 
+import java.util.Arrays;
--- End diff --

Removed this import


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-03 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69386554
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
 ---
@@ -192,26 +192,30 @@ object GenerateUnsafeProjection extends 
CodeGenerator[Seq[Expression], UnsafePro
 val fixedElementSize = et match {
--- End diff --

Updated the name


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-03 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69386552
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
 ---
@@ -192,26 +192,30 @@ object GenerateUnsafeProjection extends 
CodeGenerator[Seq[Expression], UnsafePro
 val fixedElementSize = et match {
   case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS => 8
   case _ if ctx.isPrimitiveType(jt) => et.defaultSize
-  case _ => 0
+  case _ => 8
--- End diff --

Added a comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-03 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69386548
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UnsafeArrayDataBenchmark.scala
 ---
@@ -0,0 +1,298 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.catalyst.expressions.{UnsafeArrayData, 
UnsafeRow}
+import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, 
UnsafeArrayWriter}
+import org.apache.spark.unsafe.Platform
+import org.apache.spark.util.Benchmark
+
+/**
+ * Benchmark [[UnsafeArrayDataBenchmark]] for UnsafeArrayData
+ * To run this:
+ *  build/sbt "sql/test-only *benchmark.UnsafeArrayDataBenchmark"
+ *
+ * Benchmarks in this file are skipped in normal builds.
+ */
+class UnsafeArrayDataBenchmark extends BenchmarkBase {
+
+  new SparkConf()
+.setMaster("local[1]")
+.setAppName("microbenchmark")
+.set("spark.driver.memory", "3g")
--- End diff --

removed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-03 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69386546
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java
 ---
@@ -126,11 +187,11 @@ public void write(int ordinal, Decimal input, int 
precision, int scale) {
 // Write the bytes to the variable length portion.
 Platform.copyMemory(
   bytes, Platform.BYTE_ARRAY_OFFSET, holder.buffer, holder.cursor, 
bytes.length);
-setOffset(ordinal);
+write(ordinal, ((long)(holder.cursor - startingOffset) << 32) | 
((long) bytes.length));
--- End diff --

yes, done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-03 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69386542
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java
 ---
@@ -33,91 +38,147 @@
   // The offset of the global buffer where we start to write this array.
   private int startingOffset;
 
+  // The number of elements in this array
+  private int numElements;
+
+  private int headerInBytes;
+
+  private void assertIndexIsValid(int index) {
+assert index >= 0 : "index (" + index + ") should >= 0";
+assert index < numElements : "index (" + index + ") should < " + 
numElements;
+  }
+
   public void initialize(BufferHolder holder, int numElements, int 
fixedElementSize) {
-// We need 4 bytes to store numElements and 4 bytes each element to 
store offset.
--- End diff --

Added a comment regarding 4 bytes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-03 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69386534
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UnsafeArrayDataBenchmark.scala
 ---
@@ -0,0 +1,298 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.catalyst.expressions.{UnsafeArrayData, 
UnsafeRow}
+import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, 
UnsafeArrayWriter}
+import org.apache.spark.unsafe.Platform
+import org.apache.spark.util.Benchmark
+
+/**
+ * Benchmark [[UnsafeArrayDataBenchmark]] for UnsafeArrayData
+ * To run this:
+ *  build/sbt "sql/test-only *benchmark.UnsafeArrayDataBenchmark"
+ *
+ * Benchmarks in this file are skipped in normal builds.
+ */
+class UnsafeArrayDataBenchmark extends BenchmarkBase {
+
+  new SparkConf()
+.setMaster("local[1]")
+.setAppName("microbenchmark")
+.set("spark.driver.memory", "3g")
--- End diff --

Removed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-03 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69386335
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
--- End diff --

It's not general pattern matching problem. You know, it's the 'key=value` 
part of the URL, isn't it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-03 Thread janplus

Github user janplus commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69386305
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
+case _ => null
+  }
+
+  private lazy val stringExprs = children.toArray
+  import ParseUrl._
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.size > 3 || children.size < 2) {
+  TypeCheckResult.TypeCheckFailure("parse_url function requires two or 
three arguments")
--- End diff --

OK, thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-03 Thread janplus

Github user janplus commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69386280
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
--- End diff --

BTW, implement here seems that the invalidation still happened in 
`Executor` side. IMO, the invalidation of key's value can hardly be considered 
as `AnalysisException`, just like the `rlike` function, isn't it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-03 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69386249
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
+case _ => null
+  }
+
+  private lazy val stringExprs = children.toArray
+  import ParseUrl._
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.size > 3 || children.size < 2) {
+  TypeCheckResult.TypeCheckFailure("parse_url function requires two or 
three arguments")
--- End diff --

And, here, let's use `$prettyName` instead of `parse_url`. I mean
```scala
TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two or 
three arguments")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-03 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69386214
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
--- End diff --

Yep. I think so.
1. `PatternSyntaxException` is a result of `Executor` side failure. It 
means it could be any exception.
2. `AnalysisException` is a result of `Driver`-side static analysis.
We need to do our best to prevent the error of type 1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13680
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13680
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61673/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13680
  
**[Test build #61673 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61673/consoleFull)**
 for PR 13680 at commit 
[`243252a`](https://github.com/apache/spark/commit/243252a460794c2b6e2dff3757e421e2532e87bf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14004: [SPARK-16285][SQL] Implement sentences SQL functions

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14004
  
**[Test build #61680 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61680/consoleFull)**
 for PR 14004 at commit 
[`ea75373`](https://github.com/apache/spark/commit/ea753738b153d15cc5c75eea88a8b86ad79d1d7b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 >

201 - 300 of 312 matches

Mail list logo