[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-29 Thread javadba
Github user javadba commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50443929
  
@ueshin   That is not what the title reads. Here is the title: 

Add Length support to Spark SQL and HQL and Strlen support to SQ


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-29 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50442618
  
First, I would like to confirm, but which do you want to add to HQL, 
`Length` or `Strlen`?
The title of this PR says to add `Length` to HQL, but the implementation 
adds `Strlen`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread javadba
Github user javadba commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50439423
  
@ueshin The length applies to any datatype -  i described in a prior 
comment.  AFA getBytes, I am following the recommendation of @chenghao-intel : 
 
   I think the following probably what you want

// assume s1 is the string
val string = new String(s1.getBytes(), encoding)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50437674
  
I'm sorry but now I become confused.
`Length` and `Strlen` look like becoming almost the same implementation.
What do you intend the difference between them is?

BTW, `getBytes()` without charset argument should not be used because it 
depends on enviroment, e.g. platform's default charset.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread javadba
Github user javadba commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50431808
  
@chenghao-intel  and @ueshin .  I have incorporated your comments and the 
ExpressionsEvaluationSuite is passing now locally with the corrected expected 
results.  I have pushed the changes to my PR: please review again for any 
remaining comments. Thanks for your inputs!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread javadba
Github user javadba commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50430768
  
@chenghao-intelThanks for your pointer: that should fix the problem. 
Here is updated test case that works now on my local and I am going to push it 
to my PR:

checkEvaluation(Strlen(Literal("\uF93D\uF936\uF949\uF942", StringType), 
"ISO-8859-1"), 4)
checkEvaluation(Strlen(Literal("\uF93D\uF936\uF949\uF942", StringType), 
"UTF-8"), 4)
checkEvaluation(Strlen(Literal("\uF93D\uF936\uF949\uF942", StringType), 
"UTF-16"), 2)
checkEvaluation(Strlen(Literal("\uF93D\uF936\uF949\uF942", StringType), 
"UTF-32"), 1)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15504346
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +212,81 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
+
+  override def toString = s"Length($child)"
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null
+} else if (!string.isInstanceOf[String]) {
+  string.toString.length
+} else {
+  string.toString.getBytes.length
+}
+  }
+
+}
+
+object StrlenConstants {
+  val DefaultEncoding = "ISO-8859-1"
+}
+
+/**
+ * A function that returns the number of characters in a string expression
+ */
+case class Strlen(child: Expression, encoding : Expression) extends 
UnaryExpression
+  with Logging {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
+
+  override def toString = s"Strlen($child, $encoding)"
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null
+} else if (!string.isInstanceOf[String]) {
+  throw new IllegalArgumentException(s"Non-string value [$string] 
provided to strlen")
--- End diff --

@javadba I think you probably also need to update rules in the 
`HiveTypeCoercion`, which will insert the expression `Cast` if the child 
expression type are not satisfied, and then you won't need the children data 
type checking here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread javadba
Github user javadba commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15504335
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +212,81 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
+
+  override def toString = s"Length($child)"
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null
+} else if (!string.isInstanceOf[String]) {
+  string.toString.length
+} else {
+  string.toString.getBytes.length
+}
+  }
+
+}
+
+object StrlenConstants {
+  val DefaultEncoding = "ISO-8859-1"
+}
+
+/**
+ * A function that returns the number of characters in a string expression
+ */
+case class Strlen(child: Expression, encoding : Expression) extends 
UnaryExpression
+  with Logging {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
--- End diff --

OK, made that change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread javadba
Github user javadba commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50429907
  
@chenghao-intel   OK I see the difference - it is subtle!   I will try your 
suggestion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15503732
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +212,81 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
+
+  override def toString = s"Length($child)"
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null
+} else if (!string.isInstanceOf[String]) {
+  string.toString.length
+} else {
+  string.toString.getBytes.length
+}
+  }
+
+}
+
+object StrlenConstants {
+  val DefaultEncoding = "ISO-8859-1"
+}
+
+/**
+ * A function that returns the number of characters in a string expression
+ */
+case class Strlen(child: Expression, encoding : Expression) extends 
UnaryExpression
+  with Logging {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
--- End diff --

No, I'm just saying:

```scala
override def nullable = true
```

instead of

```scala
override def nullable = child.nullable
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50428615
  
Chinese character is always 3 bytes in UTF-8, the `Strlen` will always 
return 3 for a single Chinese character, that's probably not right for the 
`Strlen` semantically. That's why I said get the bytes length for a `string` 
probably not right.

I think the following probably what you want
```
// assume s1 is the string
val string = new String(s1.getBytes(), encoding)
// assume s1 is the bytes
val string = new String(s1, encoding)
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread javadba
Github user javadba commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15503648
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +212,81 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
+
+  override def toString = s"Length($child)"
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null
+} else if (!string.isInstanceOf[String]) {
+  string.toString.length
+} else {
+  string.toString.getBytes.length
+}
+  }
+
+}
+
+object StrlenConstants {
+  val DefaultEncoding = "ISO-8859-1"
+}
+
+/**
+ * A function that returns the number of characters in a string expression
+ */
+case class Strlen(child: Expression, encoding : Expression) extends 
UnaryExpression
+  with Logging {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
--- End diff --

OK I think I got what you are saying. Is the following code looking better 
to you?

} else if (!string.isInstanceOf[String]) {
  log.debug(s"Non-string value [$string] provided to strlen")
  null



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15503576
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +212,81 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
+
+  override def toString = s"Length($child)"
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null
+} else if (!string.isInstanceOf[String]) {
+  string.toString.length
+} else {
+  string.toString.getBytes.length
+}
+  }
+
+}
+
+object StrlenConstants {
+  val DefaultEncoding = "ISO-8859-1"
+}
+
+/**
+ * A function that returns the number of characters in a string expression
+ */
+case class Strlen(child: Expression, encoding : Expression) extends 
UnaryExpression
+  with Logging {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
--- End diff --

Ah, this is not related to my prior comment.
I just mentioned that you should remove unnecessary `isInstanceOf` on my 
prior comment.

After that, you changed the logic to return `null` if the exception is 
thrown, so the nullability of this operator was also changed to `true`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread javadba
Github user javadba commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50427812
  
@chenghao-intel   

I am not sure why you try to get the bytes length for Strlen. Seems not 
reasonable.

Given that the encoding is key piece of strlen, I do not understand this 
comment - would you please clarify? thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread javadba
Github user javadba commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15503443
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +212,81 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
+
+  override def toString = s"Length($child)"
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null
+} else if (!string.isInstanceOf[String]) {
+  string.toString.length
+} else {
+  string.toString.getBytes.length
+}
+  }
+
+}
+
+object StrlenConstants {
+  val DefaultEncoding = "ISO-8859-1"
+}
+
+/**
+ * A function that returns the number of characters in a string expression
+ */
+case class Strlen(child: Expression, encoding : Expression) extends 
UnaryExpression
+  with Logging {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
--- End diff --

Hi Ueshin,  based on your prior comment I changed the logic it now returns 
null instead of throwing the exception. So I do not understand this comment - 
would you please clarify? thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread javadba
Github user javadba commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50427481
  
I am returning to the issue of the two failed tests for strlen:

Actually my local behavior IS correct - I  made mistake not to include the 
encoding when I did the scala REPL.  Here is the corrected invocation in scala 
REPL: 

scala> "\uF93D\uF936\uF949\uF942".getBytes("UTF-8").length
res1: Int = 12

Notice that the above returns 12 !

For reference, the answer that happens on Jenkins is correct for the 
default encoding as well as ISO-8859-1:

scala> "\uF93D\uF936\uF949\uF942".getBytes("ISO-8859-1").length
res2: Int = 4





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50427397
  
```
scala> val a = "\uF93D\uF936\uF949\uF942"
a: String = 

scala> a.getBytes("UTF-8").length
res2: Int = 12

scala> a.length
res4: Int = 4
```
It's works the same as you wrote in `Strlen`
```
string.asInstanceOf[String].getBytes(strEncoding).length
```
I am not sure why you try to get the bytes length for `Strlen`. Seems not 
reasonable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50427271
  
QA results for PR 1586:- This patch FAILED unit tests.- This patch 
merges cleanly- This patch adds the following public classes 
(experimental):case class Length(child: Expression) extends UnaryExpression 
{case class Strlen(child: Expression, encoding : Expression) extends 
UnaryExpressionFor more information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17323/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15503212
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +212,81 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
+
+  override def toString = s"Length($child)"
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null
+} else if (!string.isInstanceOf[String]) {
+  string.toString.length
+} else {
+  string.toString.getBytes.length
+}
+  }
+
+}
+
+object StrlenConstants {
+  val DefaultEncoding = "ISO-8859-1"
+}
+
+/**
+ * A function that returns the number of characters in a string expression
+ */
+case class Strlen(child: Expression, encoding : Expression) extends 
UnaryExpression
+  with Logging {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
--- End diff --

Hi, now `nullable` becomes `true` because `eval` returns `null` if the 
`UnsupportedEncodingException` is thrown regardless of `child.nullable`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread javadba
Github user javadba commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50426681
  
@chenghao-intel   Hi, I just ran  your suggested command line

sbt/sbt -Phive=true 'test-only 
org.apache.spark.sql.catalyst.expressions.ExpressionEvaluationSuite'


The results are the same (error) on my local machine as for the maven 
command line:

   - Strlen *** FAILED ***
   [info]   Incorrect Evaluation: Strlen(, UTF-8), actual: 12, 
expected: 4(ExpressionEvaluationSuite.scala:129)

So that is useful to know - the invocation of the test does not seem to be 
the issue.  Not sure yet what is the root cause of difference (  [Jenkins/scala 
REPL] vs Local)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread javadba
Github user javadba commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50426189
  
@chenghao-intel   Thanks for commenting on my testing issue.  I am running 
the test as follows

mvn -Pyarn -Phadoop-2.3 -Phive test 
-DwildcardSuites=org.apache.spark.sql.catalyst.expressions.ExpressionEvaluationSuite


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread javadba
Github user javadba commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50426118
  

I just double-checked local machine and yes i can verify the discrepancy in 
behavior. After making the Jenkins-detected adjustment to the expected length , 
I re-ran the one test locally:

checkEvaluation(Strlen(Literal("\uF93D\uF936\uF949\uF942", StringType), 
"UTF-8"), 4)  // change from 12 to 4

And LOCALLY only this gives the error:

- Strlen *** FAILED ***
  Incorrect Evaluation: Strlen(, UTF-8), actual: 12, expected: 4
(ExpressionEvaluationSuite.scala:129)


This verifies that the local behavior (actual=12) differs from 
Jenkins/expected behavior (actual=4)

My local behavior is incorrect. Not sure yet how to handle this.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50426034
  
Are you running the unit test with sbt or with your IDE? Can you try 
```
sbt/sbt -Phive=true 'test-only 
org.apache.spark.sql.catalyst.expressions.ExpressionEvaluationSuite'
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread javadba
Github user javadba commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50425496
  

So the Jenkins answer IS the correct one;

scala> "\uF93D\uF936\uF949\uF942".length
res0: Int = 4

The item for me to resolve is: why is the answer coming back as 12 on my 
local machine for the same code/testcase..



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread javadba
Github user javadba commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50425197
  
Here is the test that is failing (but only on Jenkins not on my local 
testing):

checkEvaluation(Strlen(Literal("\uF93D\uF936\uF949\uF942", StringType), 
"UTF-8"), 12)

So in my local testing the Strlen operation is returning 12. But on Jenkins 
apparently it is returning 4. I do not really understand yet.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread javadba
Github user javadba commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50424677
  

Ouch! the Jenkins test is failing at a different place.  It is not 
immediately apparent why the test passed for me locally and failed on jenkins. 
I had already double checked the remote pull DOES have same code as on my local 
machine. Further investigation is required on my side and I will report back 
here.

The specific tests failing have to do with UTF encoding lengths of chinese 
characters:

- Length *** FAILED ***
[info]   Incorrect Evaluation: 12 AS Optimized(Length(綠虜雷壟))#349, 
actual: 12, expected: 4 (ExpressionEvaluationSuite.scala:129)

- Length *** FAILED ***
[info]   Incorrect Evaluation: Length(綠虜雷壟), actual: 12, expected: 
4 (ExpressionEvaluationSuite.scala:129)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50424554
  
@javadba ,sorry, I have something confusing in expression, I am not sure if 
we did correctly. Probably @marmbrus can clarifies them.

And, I know we should keep both `strlen` and `length` in SQL, but I mean we 
probably just keep a single class in expression, let's say:
```
case class Strlen(child: Expression, encoding : 
Expression=Literal(StrlenConstants.DefaultEncoding))
extends UnaryExpression
```
In the parser, we can provide different encoding or use the default one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15501955
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +212,81 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
+
+  override def toString = s"Length($child)"
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null
+} else if (!string.isInstanceOf[String]) {
+  string.toString.length
+} else {
+  string.toString.getBytes.length
+}
+  }
+
+}
+
+object StrlenConstants {
+  val DefaultEncoding = "ISO-8859-1"
+}
+
+/**
+ * A function that returns the number of characters in a string expression
+ */
+case class Strlen(child: Expression, encoding : Expression) extends 
UnaryExpression
+  with Logging {
+
+  type EvaluatedType = Any
--- End diff --

@marmbrus I've checked the existed expressions, seems most of them are set 
the `EvaluatedType` as `Any`, but from the description in `Expression`, 
`EvaluatedType` means the narrowest possible type that is produced when this 
expression is evaluated. Not sure if we set bad examples.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15501825
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +212,81 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
+
+  override def toString = s"Length($child)"
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null
+} else if (!string.isInstanceOf[String]) {
+  string.toString.length
+} else {
+  string.toString.getBytes.length
+}
+  }
+
+}
+
+object StrlenConstants {
+  val DefaultEncoding = "ISO-8859-1"
+}
+
+/**
+ * A function that returns the number of characters in a string expression
+ */
+case class Strlen(child: Expression, encoding : Expression) extends 
UnaryExpression
+  with Logging {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
+
+  override def toString = s"Strlen($child, $encoding)"
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null
+} else if (!string.isInstanceOf[String]) {
+  throw new IllegalArgumentException(s"Non-string value [$string] 
provided to strlen")
--- End diff --

@marmbrus do you think it's more reasonable to put the children data type 
checking into the `resolved`? I am not sure if we do the right thing for the 
existed expressions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50421578
  
QA tests have started for PR 1586. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17323/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50421485
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread javadba
Github user javadba commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50421458
  
All tests passing

mvn -Pyarn -Phadoop-2.3 -Phive test


INFO] 

[INFO] Reactor Summary:
[INFO] 
[INFO] Spark Project Parent POM .. SUCCESS [2.065s]
[INFO] Spark Project Core  SUCCESS 
[17:30.416s]
[INFO] Spark Project Bagel ... SUCCESS [21.431s]
[INFO] Spark Project GraphX .. SUCCESS 
[2:13.008s]
[INFO] Spark Project ML Library .. SUCCESS 
[5:29.677s]
[INFO] Spark Project Streaming ... SUCCESS 
[7:17.728s]
[INFO] Spark Project Tools ... SUCCESS [3.675s]
[INFO] Spark Project Catalyst  SUCCESS [8.714s]
[INFO] Spark Project SQL . SUCCESS 
[1:56.384s]
[INFO] Spark Project Hive  SUCCESS 
[2:44:50.515s]
[INFO] Spark Project REPL  SUCCESS 
[1:09.897s]
[INFO] Spark Project YARN Parent POM . SUCCESS [2.720s]
[INFO] Spark Project YARN Stable API . SUCCESS [9.891s]
[INFO] Spark Project Assembly  SUCCESS [0.628s]
[INFO] Spark Project External Twitter  SUCCESS [9.825s]
[INFO] Spark Project External Kafka .. SUCCESS [10.803s]
[INFO] Spark Project External Flume .. SUCCESS [24.332s]
[INFO] Spark Project External ZeroMQ . SUCCESS [9.918s]
[INFO] Spark Project External MQTT ... SUCCESS [9.112s]
[INFO] Spark Project Examples  SUCCESS [13.890s]
[INFO] 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread javadba
Github user javadba commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15481405
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +209,70 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  def dataType = IntegerType
--- End diff --

OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread javadba
Github user javadba commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15479867
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +209,70 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
--- End diff --

If we make integertype well then the input has to be integer. I have made 
the semantics here that ANY type may be provided.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread javadba
Github user javadba commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15479873
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +209,70 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  def dataType = IntegerType
+  override def foldable = child.foldable
+  def nullable = child.nullable
+
+  override def toString = s"Length($child)"
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null.asInstanceOf[DataType]
+} else if (!string.isInstanceOf[String]) {
+  string.toString.length
+} else {
+  string.toString.getBytes.length
+}
+  }
+
+}
+
+object StrlenConstants {
+  val DefaultEncoding = "ISO-8859-1"
+}
+
+/**
+ * A function that returns the number of characters in a string expression
+ */
+case class Strlen(child: Expression, encoding : Expression) extends 
UnaryExpression {
+
+  type EvaluatedType = Any
+
+  def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  def nullable = child.nullable
+
+  override def toString = s"Strlen($child, $encoding)"
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null.asInstanceOf[DataType]
--- End diff --

OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15478469
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +209,70 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  def dataType = IntegerType
+  override def foldable = child.foldable
+  def nullable = child.nullable
+
+  override def toString = s"Length($child)"
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null.asInstanceOf[DataType]
+} else if (!string.isInstanceOf[String]) {
+  string.toString.length
+} else {
+  string.toString.getBytes.length
+}
+  }
+
+}
+
+object StrlenConstants {
+  val DefaultEncoding = "ISO-8859-1"
+}
+
+/**
+ * A function that returns the number of characters in a string expression
+ */
+case class Strlen(child: Expression, encoding : Expression) extends 
UnaryExpression {
+
+  type EvaluatedType = Any
+
+  def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  def nullable = child.nullable
+
+  override def toString = s"Strlen($child, $encoding)"
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null.asInstanceOf[DataType]
+} else if (!string.isInstanceOf[String]) {
+  throw new IllegalArgumentException(s"Non-string value [$string] 
provided to strlen")
+} else {
+  var evalEncoding = encoding.eval(input)
+  val strEncoding =
+if (evalEncoding != null) {
+  evalEncoding.toString
+} else {
+  StrlenConstants.DefaultEncoding
+}
+  string.toString.getBytes(strEncoding).length
--- End diff --

I would say that the convention for invalid, usually static, input is to 
throw an exception.  For example, I believe we throw an exception for invalid 
regexes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread javadba
Github user javadba commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15478342
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +209,70 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  def dataType = IntegerType
+  override def foldable = child.foldable
+  def nullable = child.nullable
+
+  override def toString = s"Length($child)"
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null.asInstanceOf[DataType]
+} else if (!string.isInstanceOf[String]) {
+  string.toString.length
+} else {
+  string.toString.getBytes.length
+}
+  }
+
+}
+
+object StrlenConstants {
+  val DefaultEncoding = "ISO-8859-1"
+}
+
+/**
+ * A function that returns the number of characters in a string expression
+ */
+case class Strlen(child: Expression, encoding : Expression) extends 
UnaryExpression {
+
+  type EvaluatedType = Any
+
+  def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  def nullable = child.nullable
+
+  override def toString = s"Strlen($child, $encoding)"
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null.asInstanceOf[DataType]
+} else if (!string.isInstanceOf[String]) {
+  throw new IllegalArgumentException(s"Non-string value [$string] 
provided to strlen")
+} else {
+  var evalEncoding = encoding.eval(input)
+  val strEncoding =
+if (evalEncoding != null) {
+  evalEncoding.toString
+} else {
+  StrlenConstants.DefaultEncoding
+}
+  string.toString.getBytes(strEncoding).length
--- End diff --

I am trying out to use just string.getBytes now, per your comment.  RE: 
null vs exception. OK I can change it to null . That is kind of a conventions 
decision.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread javadba
Github user javadba commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50373079
  
@chenghao-intelLet us keep both length and strlen: they both serve 
different purposes.  The length operation may be applied to any datatype. The 
tests show examples of finding the "select max(length(s)) from testData"As 
far as providing a default encoding type - I was unable to make that code work 
in strlen(someString).  I want to get this pushed, so we can make a  TODO for 
that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread javadba
Github user javadba commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15477783
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +209,70 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  def dataType = IntegerType
+  override def foldable = child.foldable
+  def nullable = child.nullable
+
+  override def toString = s"Length($child)"
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null.asInstanceOf[DataType]
--- End diff --

Hi Ueshin,  I will try it the way you suggest here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-27 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50300320
  
That's very useful feature of getting the string length for different 
character set. Since most of code are quite similar between `Length` and 
`StrLen`, can we eliminate the `Length` and use the `StrLen` by providing the 
default character set name?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-27 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15446417
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +209,70 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  def dataType = IntegerType
+  override def foldable = child.foldable
+  def nullable = child.nullable
+
+  override def toString = s"Length($child)"
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null.asInstanceOf[DataType]
+} else if (!string.isInstanceOf[String]) {
+  string.toString.length
+} else {
+  string.toString.getBytes.length
+}
+  }
+
+}
+
+object StrlenConstants {
+  val DefaultEncoding = "ISO-8859-1"
+}
+
+/**
+ * A function that returns the number of characters in a string expression
+ */
+case class Strlen(child: Expression, encoding : Expression) extends 
UnaryExpression {
+
+  type EvaluatedType = Any
+
+  def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  def nullable = child.nullable
+
+  override def toString = s"Strlen($child, $encoding)"
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null.asInstanceOf[DataType]
+} else if (!string.isInstanceOf[String]) {
+  throw new IllegalArgumentException(s"Non-string value [$string] 
provided to strlen")
+} else {
+  var evalEncoding = encoding.eval(input)
+  val strEncoding =
+if (evalEncoding != null) {
+  evalEncoding.toString
+} else {
+  StrlenConstants.DefaultEncoding
+}
+  string.toString.getBytes(strEncoding).length
--- End diff --

The `child` outputs `StringType` in my first sense of `Strlen`, so the 
`string.toString` here seems a little bit strange.

By the way, if `strEncoding` is an invalid charset name, should we throw 
exception or returns `null` instead? Seems more reasonable for me to returns 
`null`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-27 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15446347
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +209,70 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  def dataType = IntegerType
--- End diff --

Add `override`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-27 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15446335
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +209,70 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
--- End diff --

Should be `IntegerType`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-27 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15445580
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +209,70 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  def dataType = IntegerType
+  override def foldable = child.foldable
+  def nullable = child.nullable
+
+  override def toString = s"Length($child)"
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null.asInstanceOf[DataType]
+} else if (!string.isInstanceOf[String]) {
+  string.toString.length
+} else {
+  string.toString.getBytes.length
+}
+  }
+
+}
+
+object StrlenConstants {
+  val DefaultEncoding = "ISO-8859-1"
+}
+
+/**
+ * A function that returns the number of characters in a string expression
+ */
+case class Strlen(child: Expression, encoding : Expression) extends 
UnaryExpression {
+
+  type EvaluatedType = Any
+
+  def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  def nullable = child.nullable
+
+  override def toString = s"Strlen($child, $encoding)"
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null.asInstanceOf[DataType]
--- End diff --

Also `asInstanceOf[DataType]` is not needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-27 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15445571
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +209,70 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  def dataType = IntegerType
+  override def foldable = child.foldable
+  def nullable = child.nullable
+
+  override def toString = s"Length($child)"
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null.asInstanceOf[DataType]
--- End diff --

Hi, I think `asInstanceOf[DataType]` is not needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-27 Thread javadba
Github user javadba commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50289795
  
The updated code got caught by one of the cases in the Hive compatibility 
suite.

The Hive UDF length calculation appears to be  different than the new one 
implemented, presumably due to differences in handling of character encoding.  
For the fix: I will make the length() function use the same character encoding 
as does Hive to keep it compatible.  The strlen() method will be the "outlet" 
to permit flexible handling of multi byte character sets in the general RDD (no 
strlen method is defined in hive proper).

 I am going to roll back just the hive portion of the commit, and will 
report back end of evening.

 udf_length *** FAILED ***
[info]   Results do not match for udf_length:
[info]   SELECT length(dest1.name) FROM dest1
[info]   == Logical Plan ==
[info]   Project [Length(name#41188) AS c_0#41186]
[info]MetastoreRelation default, dest1, None
[info]   
[info]   == Optimized Logical Plan ==
[info]   Project [Length(name#41188) AS c_0#41186]
[info]MetastoreRelation default, dest1, None
[info]   
[info]   == Physical Plan ==
[info]   Project [Length(name#41188:0) AS c_0#41186]
[info]HiveTableScan [name#41188], (MetastoreRelation default, dest1, 
None), None
[info]   c_0
[info]   !== HIVE - 1 row(s) ==   == CATALYST - 1 row(s) ==
[info]   !2   6 (HiveComparisonTest.scala:366)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50284646
  
QA results for PR 1586:- This patch FAILED unit tests.- This patch 
merges cleanly- This patch adds the following public classes 
(experimental):case class Length(child: Expression) extends UnaryExpression 
{case class Strlen(child: Expression, encoding : Expression) extends 
UnaryExpression {For more information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17246/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50281975
  
QA tests have started for PR 1586. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17246/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-27 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50281863
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-27 Thread javadba
Github user javadba commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50260024
  
After a fair bit of  struggling with testing inconsistencies and maven and 
git, I have the updates in place.  Please take a look whenever you have a 
chance - no rush ;)  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-25 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50216434
  
It doesn't get to unit tests if the style check fails.
On Jul 25, 2014 3:48 PM, "StephenBoesch"  wrote:

> Thanks for the review Michael! I agree with / will apply all of your
> comments and will re-run with sbt scalastyle . Question: from
> 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17181/consoleFull
> there is a message in the jenkins output saying that unit tests failed. 
But
> I can not find any information on which failed tests. (I had run and 
re-run
> the sql/core and sql/catalyst tests before submitting the PR and they were
> passing)
>
> —
> Reply to this email directly or view it on GitHub
> .
>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-25 Thread javadba
Github user javadba commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50213067
  
Thanks for the review Michael!   I agree with / will apply all of your 
comments and will re-run with  sbt scalastyle .  Question: from 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17181/consoleFull
 there is a message in the jenkins output saying that unit tests failed. But I 
can not find any information on which failed tests.  (I had run and re-run  the 
sql/core and sql/catalyst tests before submitting the PR and they were passing)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-25 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50194538
  
BTW, there are also a few style errors.  You can find them locally by 
running `sbt scalastyle`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50121672
  
QA results for PR 1586:- This patch FAILED unit tests.- This patch 
merges cleanly- This patch adds the following public classes 
(experimental):trait LengthExpression {case class Length(child: 
Expression) extends UnaryExpression with LengthExpression {case class 
Strlen(child: Expression, encoding : Expression) extends UnaryExpression with 
LengthExpression {For more information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17181/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50121561
  
QA tests have started for PR 1586. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17181/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-25 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50121226
  
Thanks for doing this!  A few minor comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-25 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15390612
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -988,8 +990,15 @@ private[hive] object HiveQl {
 case Token("TOK_FUNCTION", Token(RAND(), Nil) :: Nil) => Rand
 case Token("TOK_FUNCTION", Token(SUBSTR(), Nil) :: string :: pos :: 
Nil) => 
   Substring(nodeToExpr(string), nodeToExpr(pos), 
Literal(Integer.MAX_VALUE, IntegerType))
-case Token("TOK_FUNCTION", Token(SUBSTR(), Nil) :: string :: pos :: 
length :: Nil) => 
+case Token("TOK_FUNCTION", Token(SUBSTR(), Nil) :: string :: pos :: 
length :: Nil) =>
   Substring(nodeToExpr(string), nodeToExpr(pos), nodeToExpr(length))
+case Token("TOK_FUNCTION", Token(LENGTH(), Nil) :: string :: Nil) =>
+  Length(nodeToExpr(string))
+//case Token("TOK_FUNCTION", Token(STRLEN(), Nil) :: string :: Nil) =>
--- End diff --

remove


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-25 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50121152
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-25 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15390556
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala
 ---
@@ -543,4 +546,46 @@ class ExpressionEvaluationSuite extends FunSuite {
 assert(Substring(s_notNull, Literal(null, IntegerType), Literal(2, 
IntegerType)).nullable === true)
 assert(Substring(s_notNull, Literal(0, IntegerType), Literal(null, 
IntegerType)).nullable === true)
   }
+
+  test("Length") {
+val row = new GenericRow(Array[Any](null, 0,12, 123, 12.4F, 
12345678901L, 1234567890.2D, "1234567890ABC", "\uF93D\uF936\uF949\uF942"))
+val ctr = new AtomicInteger(0)
--- End diff --

An `AtomicInteger` is expensive unless you are going to be doing 
`getAndIncrement` from multiple concurrent threads.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-25 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15390570
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala ---
@@ -39,12 +41,15 @@ class QueryTest extends PlanTest {
 def prepareAnswer(answer: Seq[Any]) = if (!isSorted) 
answer.sortBy(_.toString) else answer
 val sparkAnswer = try rdd.collect().toSeq catch {
   case e: Exception =>
+val sw = new StringWriter
+e.printStackTrace(new PrintWriter(sw))
--- End diff --

Use `org.apache.spark.sql.catalyst.util.stackTraceToString`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-25 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15390523
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala
 ---
@@ -543,4 +546,46 @@ class ExpressionEvaluationSuite extends FunSuite {
 assert(Substring(s_notNull, Literal(null, IntegerType), Literal(2, 
IntegerType)).nullable === true)
 assert(Substring(s_notNull, Literal(0, IntegerType), Literal(null, 
IntegerType)).nullable === true)
   }
+
+  test("Length") {
+val row = new GenericRow(Array[Any](null, 0,12, 123, 12.4F, 
12345678901L, 1234567890.2D, "1234567890ABC", "\uF93D\uF936\uF949\uF942"))
--- End diff --

I think this test would be easier to read if it was written with Literals.  
For example:

```scala
checkEvaluation(Length(Literal(null, StringType)), null, row)
```

That way the test and the answer are right next to each other.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-25 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15390507
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -208,6 +208,69 @@ case class EndsWith(left: Expression, right: 
Expression)
   def compare(l: String, r: String) = l.endsWith(r)
 }
 
+trait LengthExpression {
+  self: UnaryExpression =>
+  type EvaluatedType = Any
+
+  def dataType = IntegerType
+  override def foldable = child.foldable
+  def nullable = child.nullable
+
+  override def eval(input: Row): EvaluatedType = {
--- End diff --

I'd just put this in `Strlen` since that is the only place it is used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-25 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15390509
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -208,6 +208,69 @@ case class EndsWith(left: Expression, right: 
Expression)
   def compare(l: String, r: String) = l.endsWith(r)
 }
 
+trait LengthExpression {
+  self: UnaryExpression =>
+  type EvaluatedType = Any
+
+  def dataType = IntegerType
+  override def foldable = child.foldable
+  def nullable = child.nullable
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+//println(s"eval=$string")
--- End diff --

Remove commented code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50112581
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-24 Thread javadba
GitHub user javadba opened a pull request:

https://github.com/apache/spark/pull/1586

SPARK-2686 Add Length support to Spark SQL and HQL and Strlen support to SQL

Syntactic, parsing, and operational support have been added for LEN(GTH) 
and STRLEN functions.
Examples:
SQL:
import org.apache.spark.sql._
case class TestData(key: Int, value: String)
val sqlc = new SQLContext(sc)
import sqlc._
val testData: SchemaRDD = sqlc.sparkContext.parallelize(
(1 to 100).map(i => TestData(i, i.toString)))
testData.registerAsTable("testData")
sqlc.sql("select length(key) as key_len from testData order by key_len desc 
limit 5").collect
res12: Array[org.apache.spark.sql.Row] = Array([3], [2], [2], [2], [2])
HQL:
val hc = new org.apache.spark.sql.hive.HiveContext(sc)
import hc._
hc.hql
hql("select length(grp) from simplex").collect
res14: Array[org.apache.spark.sql.Row] = Array([6], [6], [6], [6])
As far as codebase changes: they have been purposefully made similar to the 
ones made for for adding SUBSTR(ING) from July 17:
SQLParser, Optimizer, Expression, stringOperations, and HiveQL were the 
main classes changed. The testing suites affected are ConstantFolding and 
ExpressionEvaluation.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/javadba/spark strlen

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1586.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1586


commit bb252380399c4146bb63b5d6cbc66234609bab11
Author: Stephen Boesch 
Date:   2014-07-12T12:34:58Z

Support hbase-0.96-1.1 in SparkBuild

commit 947007305cb03515daa8738d3ad2063bcd226a3d
Author: Stephen Boesch 
Date:   2014-07-12T12:56:38Z

overwrote sparkbuild

commit 9b6a6471e3c1f087c186a7597c63c7ef2707eaa3
Author: Stephen Boesch 
Date:   2014-07-16T13:24:32Z

update pom.xml for hadoop-2.3-cdh50.0 and hbase 0.96.1.1

commit b04c4cbef3ecb5a6f13297391b55a36317ce957a
Author: Stephen Boesch 
Date:   2014-07-16T13:24:40Z

Merge branch 'master' of https://github.com/apache/spark

commit 5d1cb0a449bbf1ea95272a45f2d030d5cad0195c
Author: Stephen Boesch 
Date:   2014-07-23T04:33:25Z

SPARK-2638 MapOutputTracker concurrency improvement

commit 483479ac8ccb0c937da5d306fc4591aa974ed37b
Author: Stephen Boesch 
Date:   2014-07-23T16:09:26Z

Mesos workaround

commit 30910b2daac974cd2dac82e8a1b20cd60348a632
Author: Stephen Boesch 
Date:   2014-07-23T19:43:59Z

Merge remote-tracking branch 'upstream/master'

commit 7c675f8d8fc63c5f602c5a767e1215118e0f768c
Author: Stephen Boesch 
Date:   2014-07-23T20:03:18Z

Merge branch 'master' of https://github.com/javadba/spark

commit d646a2e1113252d1955185e355da06ddb690b75f
Author: Stephen Boesch 
Date:   2014-07-25T06:26:11Z

SPARK-2686 Add Length support to Spark SQL and HQL and Strlen support to SQL




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---