[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...

2017-04-17 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15398
  
Glad to see this being merged. Let me know if there is anything more I can 
do to help with any issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17230: [SPARK-19353][CORE] Generalize PipedRDD to use I/...

2017-03-27 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/17230#discussion_r108307572
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala ---
@@ -198,17 +183,114 @@ private[spark] class PipedRDD[T: ClassTag](
 val t = childThreadException.get()
 if (t != null) {
   val commandRan = command.mkString(" ")
-  logError(s"Caught exception while running pipe() operator. 
Command ran: $commandRan. " +
-s"Exception: ${t.getMessage}")
-  proc.destroy()
+  logError("Caught exception while running pipe() operator. " +
+  s"Command ran: $commandRan.", t)
   cleanup()
+  proc.destroy()
   throw t
 }
   }
 }
   }
 }
 
+/** Specifies how to write the elements of the input [[RDD]] into the 
pipe. */
+trait InputWriter[T] extends Serializable {
+  def write(dos: DataOutput, elem: T): Unit
+}
+
+/** Specifies how to read the elements from the pipe into the output 
[[RDD]]. */
+trait OutputReader[T] extends Serializable {
+  /**
+   * Reads the next element.
+   *
+   * The input is guaranteed to have at least one byte.
+   */
+  def read(dis: DataInput): T
+}
+
+class TextInputWriter[I](
+encoding: String = Codec.defaultCharsetCodec.name,
+printPipeContext: (String => Unit) => Unit = null,
+printRDDElement: (I, String => Unit) => Unit = null
+) extends InputWriter[I] {
+
+  private[this] val lineSeparator = 
System.lineSeparator().getBytes(encoding)
+  private[this] var initialized = printPipeContext == null
+
+  private def writeLine(dos: DataOutput, s: String): Unit = {
+dos.write(s.getBytes(encoding))
+dos.write(lineSeparator)
+  }
+
+  override def write(dos: DataOutput, elem: I): Unit = {
+if (!initialized) {
+  printPipeContext(writeLine(dos, _))
+  initialized = true
+}
+
+if (printRDDElement == null) {
+  writeLine(dos, String.valueOf(elem))
+} else {
+  printRDDElement(elem, writeLine(dos, _))
+}
+  }
+}
+
+class TextOutputReader(
+encoding: String = Codec.defaultCharsetCodec.name
+) extends OutputReader[String] {
+
+  private[this] val lf = "\n".getBytes(encoding)
+  private[this] val cr = "\r".getBytes(encoding)
+  private[this] val crlf = cr ++ lf
+  private[this] var buf = Array.ofDim[Byte](64)
+  private[this] var used = 0
+
+  @inline
+  /** Checks that the suffix of [[buf]] matches [[other]]. */
+  private def endsWith(other: Array[Byte]): Boolean = {
+var i = used - 1
+var j = other.length - 1
+(j <= i) && {
+  while (j >= 0) {
+if (buf(i) != other(j)) {
+  return false
+}
+i -= 1
+j -= 1
+  }
+  true
+}
+  }
+
+  override def read(dis: DataInput): String = {
--- End diff --

Could a `dis.readLine()` be used here and would it be more efficient?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16247: [SPARK-18817][SparkR] set default spark-warehouse path t...

2016-12-12 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/16247
  
jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16210: [Core][SPARK-18778]Fix the scala classpath under some en...

2016-12-08 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/16210
  
+1 to implementing it in the command builder. However I'm not sure if the 
command option `-usejavacp` or property `scala.usejavacp` is the suggested way 
to add the java class path to the REPL. I skimmed through the scala compiler's 
code and it seems both do slightly different things. I'll check this out in 
further detail


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16210: [Core][SPARK-18778]Fix the scala classpath under some en...

2016-12-08 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/16210
  
The command line parameter `-Dscala.usejavacp` is something directly passed 
to the JVM and should not be OS-dependent. I'm wondering if the issue may not 
arise from the way the parameter is defined in the spark-shell launcher 
scripts. Debian Etch is very old and probably has a older version of bash.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...

2016-10-31 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15398
  
@hvanhovell, I just wanted to add that option 2 from my last comment won't 
work: if we replace `\\` with `` in the unescape function (as is done for 
`\%` and `\_`), it will be impossible for a user to specify a single backslash.

Considering that, another potential solution would be change the ANTLR 
parser to handle escaping differently in LIKE expressions. This would however 
introduce a special case which complicates the overall rules.
Maybe leaving the current behaviour but documenting that backslashes are 
escaped on the parser-level is the best option?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...

2016-10-24 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15398
  
@gatorsmile, I updated the PR according to your comments.

Whilst adding the example test
```scala
 checkEvaluation("""%SystemDrive%\Users\John""" like 
"""\%SystemDrive\%\\Users%""", true)
```
I noticed that the equivalent SQL string:
```scala
spark.sql("""'%SystemDrive%\Users\John like '\%SystemDrive\%\\Users%'""")
```
will throw an exception:
`org.apache.spark.sql.AnalysisException: the pattern 
'\%SystemDrive\%\Users%' is invalid, the escape character is not allowed to 
precede 'U';` (Note that a backslash is missing before 'Users')

I traced down the issue to the ANTLR4 parser, that will call 
[unescapeSQLString](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala#L100)
 on the string passed to `spark.sql`, replacing any escaped character except 
for `\_` and `\%`, and thereby removing one backslash before "Users". To solve 
the issue there are two potential solutions, both flawed in my opinion:

1. Escape the backlash twice, requiring 4 backslashes in total. 
`spark.sql("""'%SystemDrive%\Users\John like '\%SystemDrive\%\\Users%'""")`
This is inconsistent as escaping the percent requires only one backslash

2. Add a special rule to the parser that ignores double slashes. This won't 
place nice with custom escape characters in the future.

My question is more fundamental, why are SQL strings escaped in the first 
place? Should it not be up to the frontend language to escape such strings? 
I.e. scala/java/python should handle replacing things such as `\n` with the 
newline or replacing unicode sequences with their corresponding symbols. 
Getting rid of the extra unescaping would be the cleanest solution to avoid 
double-escaping in my opinion


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15513: [SPARK-17963][SQL][Documentation] Add examples (e...

2016-10-23 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15513#discussion_r84591482
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala 
---
@@ -125,7 +129,7 @@ case class DescribeFunctionCommand(
 
   if (isExtended) {
 result :+
-  Row(s"Extended 
Usage:\n${replaceFunctionName(info.getExtended, info.getName)}")
+  Row(s"Extended Usage:${replaceFunctionName(info.getExtended, 
info.getName)}")
--- End diff --

Indeed, annotations require constant parameters (probably due to JVM 
requirements). Since `stripMargin` is a method on a string wrapper, it 
unfortunately cannot be used as an annotation argument 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-23 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15513
  
@HyukjinKwon (posting here to sum up the inline discussion)
I looked through the way the final description string is created before 
printing, and it looks like the formatting will actually play nicely with extra 
indentation resulting from the use of triple quotes without margin stripping.
So in the end we're just debating a matter of style here, in which case I 
gladly accept your choice.
Thanks for the work, 👍 to the formatting



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...

2016-10-22 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15398
  
@gatorsmile thx for the review. I'll update the test suite as we discussed 
initially :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-22 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15398#discussion_r84582746
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -68,7 +68,20 @@ trait StringRegexExpression extends 
ImplicitCastInputTypes {
  * Simple RegEx pattern matching function
  */
 @ExpressionDescription(
-  usage = "str _FUNC_ pattern - Returns true if str matches pattern and 
false otherwise.")
+  usage = "str _FUNC_ pattern - Returns true if str matches pattern, " +
+"null if any arguments are null, false otherwise.",
+  extended =
+"The pattern is a string which is matched literally, with exception to 
the following " +
+"special symbols:\n\n" +
+"_ matches any one character in the input (similar to . in posix " 
+
+"regular expressions)\n\n" +
+"% matches zero ore more characters in the input (similar to .* in 
" +
+"posix regular expressions)\n\n" +
+"The escape character is '\\'. If an escape character precedes a 
special symbol or " +
+"another escape character, the following character is matched 
literally, For example, " +
+"the expression ` like \\%SystemDrive\\%Users%` will match 
any `` that " +
+"starts with '%SystemDrive%\\Users'. It is invalid to escape any other 
character.\n\n" +
+"Use RLIKE to match with standard regular expressions.")
--- End diff --

ack


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-22 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15398#discussion_r84582662
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 ---
@@ -19,32 +19,34 @@ package org.apache.spark.sql.catalyst.util
 
 import java.util.regex.{Pattern, PatternSyntaxException}
 
+import org.apache.spark.sql.AnalysisException
 import org.apache.spark.unsafe.types.UTF8String
 
 object StringUtils {
 
-  // replace the _ with .{1} exactly match 1 time of any character
-  // replace the % with .*, match 0 or more times with any character
-  def escapeLikeRegex(v: String): String = {
-if (!v.isEmpty) {
-  "(?s)" + (' ' +: v.init).zip(v).flatMap {
-case (prev, '\\') => ""
-case ('\\', c) =>
-  c match {
-case '_' => "_"
-case '%' => "%"
-case _ => Pattern.quote("\\" + c)
-  }
-case (prev, c) =>
+  /** Convert 'like' pattern to Java regex. */
+  def escapeLikeRegex(str: String): String = {
+val in = str.toIterator
+val out = new StringBuilder()
+
+def fail(message: String) = throw new AnalysisException(
+  s"the pattern '$str' is invalid, $message")
--- End diff --

>  org.apache.spark.sql.AnalysisException: the pattern '\a' is invalid, the 
escape character is not allowed to precede 'a';

or 

> org.apache.spark.sql.AnalysisException: The pattern '\a' is invalid. 
Reason: the escape character is not allowed to precede 'a';
 
Isn't this just a matter of preference? I preferred the first one because 
it avoids using sentences that are expected to end in a period, when a semi 
colon is always appended after an analysis exception. It's a really really 
superficial thing though, so I'm happy to change it either way


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-22 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15513
  
I agree with that the extended part is usually multiple lines, my 
observation is that an newline will be inserted *before* any text, since the 
string starts the line following `extended="""`. Just wanted to make sure this 
is intended.

Regrading the `stripMargin`: unfortunately `stripMargin` is a method on a 
string wrapper, hence is not a constant and cannot be used in annotation 
arguments. I ran into this myself in #15398.
A workaround I could imagine is to call `stripMargin` here 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala#L498
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-22 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15513
  
More documentation is always great, thanks! Are the newlines at the 
beginning of each expression intentional? E.g.
```
extended = """
   Arguments:
  class - a string literal ...
```
will have a newline before "Arguments". Also, keep in mind that the 
indenting will be treated verbatim in triple quotes. Would it make sense to 
call a `.stripMargin` before printing the extended usage string? That way it 
would be possible to use pipes (|) to have a nice format in source code as well 
as when it is printed to the user.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...

2016-10-21 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15398
  
I went ahead and changed the parsing function to throw an AnalysisException 
in case the input escaper sequence is invalid. Feel free to revert (or just 
tell me and I'll do it) if we should stick to Hive's behaviour

Also, I hope an `AnalysisException` is the correct way to report an error 
here, let me know if it should be something else. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...

2016-10-19 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15398
  
FWIW, I just read this about the standard
> In an ANSI-compliant database, you can only use the LIKE escape character 
to escape a percent sign ( % ), an underscore ( _ ), or the escape character 
itself.


[source](https://www.ibm.com/support/knowledgecenter/en/SSGU8G_12.1.0/com.ibm.sqls.doc/ids_sqs_1388.htm).
 I think we should reconsider the need to follow Hive here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-19 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15398#discussion_r84143867
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -68,7 +68,20 @@ trait StringRegexExpression extends 
ImplicitCastInputTypes {
  * Simple RegEx pattern matching function
  */
 @ExpressionDescription(
-  usage = "str _FUNC_ pattern - Returns true if str matches pattern and 
false otherwise.")
+  usage = "str _FUNC_ pattern - Returns true if str matches pattern, " +
+"null if any arguments are null, false otherwise.",
+  extended =
+"The pattern is a string which is matched literally, with exception to 
the " +
+"following special symbols:\n\n" +
+"_ matches any one character in the input (similar to . in posix " 
+
+"regular expressions)\n\n" +
+"% matches zero ore more characters in the input (similar to .* in 
" +
+"posix regular expressions\n\n" +
+"The default escape character is '\\'. If an escape character precedes 
a special symbol or " +
+"another escape character, the following character is matched 
literally, otherwise the " +
+ "escape character is treated literally. I.e. '\\%' would match '%', 
whereas '\\a' matches " +
+ "'\\a'.\n\n" +
--- End diff --

good idea


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...

2016-10-19 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15398
  
I understand your comment about the weird escaping behaviour, @mengxr. 
Putting myself into the shoes of a new user, I would be least surprised if 
Spark were to treat the String verbatim (as in "anything after a backslash is 
treated literally") or give me an error and tell me that my escape sequence 
doesn't make sense. Changing the parsing of the escape character based on what 
follows also makes the code more complicated.

However, I also understand the high priority of making migration from Hive 
to Spark SQL most straightforward. Therefore, I concluded that existing queries 
should behave the same, and reimplemented Hive's 'like' pattern matching. Maybe 
this wasn't the right trade-off for non Hive users?

I was unable to find any docs that detail the escaping behaviour in the 
scenarios mentioned above, however I basically followed the implementation 
thanks to the [link Simon 
provided](https://github.com/apache/hive/blob/ff67cdda1c538dc65087878eeba3e165cf3230f4/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLike.java#L64).

@rxin, do you think this case warrants diverging from Hive?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to run in th...

2016-10-18 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15338
  
no objections from me ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15398: [SPARK-17647][SQL][WIP] Fix backslash escaping in 'LIKE'...

2016-10-18 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15398
  
Agreed, I'll update the regex parsing to copy Hive's behaviour


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15520: [SPARK-13747][SQL]Fix concurrent executions in ForkJoinP...

2016-10-17 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15520
  
> Yes, I think so. It just needs more lines :)

I think the solution you propose is sound. Do you think that a quick 
mention about avoiding the `BlockingContext` in a comment would help in case 
someone sees this code in the future and is puzzled as to what it does?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15520: [SPARK-13747][SQL]Fix concurrent executions in ForkJoinP...

2016-10-17 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15520
  
That makes sense, thanks for explaining. Would installing a custom 
`BlockingContext` on threads that run tasks also work? I'm not very familiar 
with the internals of the Awaitable api so I'm just throwing this out there in 
the hopes you can tell me more :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15520: [SPARK-13747][SQL]Fix concurrent executions in ForkJoinP...

2016-10-17 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15520
  
I'm not sure I completely understand why you have to use `Awaitable` 
instead of `Await`. Is it to avoid the [`blocking()` call in 
`Await`](https://github.com/scala/scala/blob/v2.12.0-RC1/src/library/scala/concurrent/package.scala#L168)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15398: [SPARK-17647][SQL][WIP] Fix backslash escaping in...

2016-10-14 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15398#discussion_r83520908
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala
 ---
@@ -74,6 +107,31 @@ class RegexpExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 checkEvaluation("a\nb" like regEx, true, create_row("a%b"))
 
 checkEvaluation(Literal.create(null, StringType) like regEx, null, 
create_row("bc%"))
+
+checkEvaluation("" like regEx, true, create_row(""))
+checkEvaluation("a" like regEx, false, create_row(""))
+checkEvaluation("" like regEx, false, create_row("a"))
+
+checkEvaluation("""""" like regEx, true, create_row("""%\\%"""))
+checkEvaluation("""%%""" like regEx, true, create_row("""%%"""))
+checkEvaluation("""\__""" like regEx, true, create_row("""\\\__"""))
+checkEvaluation("""\\\__""" like regEx, false, 
create_row("""%\\%\%"""))
+checkEvaluation("""_\\\%""" like regEx, false, create_row("""%\\"""))
+
+// scalastyle:off nonascii
+checkEvaluation("a\u20ACa" like regEx, true, create_row("_\u20AC_"))
+checkEvaluation("a€a" like regEx, true, create_row("_€_"))
+checkEvaluation("a€a" like regEx, true, create_row("_\u20AC_"))
+checkEvaluation("a\u20ACa" like regEx, true, create_row("_€_"))
+// scalastyle:on nonascii
+
+// TODO: should throw an exception?
--- End diff --

> The escape character must be a single character no more than two bytes in 
length. It can only appear in the pattern string if it is followed by itself, a 
percent sign, or an underscore.

does this mean that PostgreSQL's behaviour is wrong? Should spark throw an 
exception in point 2?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15398: [SPARK-17647][SQL][WIP] Fix backslash escaping in 'LIKE'...

2016-10-14 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15398
  
Thanks for the feedback. It was my plan to add the escape syntax in a 
separate PR. The main questions that I'm concerned with here are the points I 
enumerate in this PRs description, namely:

- should spark allow ending patterns in the escape character?
- how to handle empty input and/or patterns?
- escaping "regular" characters?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to run in th...

2016-10-14 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15338
  
You raise a good point about having to commit to support environment 
variables in future versions. A command-line option would be preferable


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15398: [SPARK-17647][SQL][WIP] Fix backslash escaping in 'LIKE'...

2016-10-13 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15398
  
The style issues are weird. Running scalastyle in sbt worked fine...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15284: [SPARK-17368] [SQL] Add support for value class serializ...

2016-10-13 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15284
  
cc @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15284: [SPARK-17368] [SQL] Add support for value class s...

2016-10-13 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15284#discussion_r83334074
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -628,7 +628,7 @@ object ScalaReflection extends ScalaReflection {
   /*
* Retrieves the runtime class corresponding to the provided type.
*/
-  def getClassFromType(tpe: Type): Class[_] = 
mirror.runtimeClass(tpe.erasure.typeSymbol.asClass)
+  def getClassFromType(tpe: Type): Class[_] = 
mirror.runtimeClass(tpe.typeSymbol.asClass)
--- End diff --

this is the main change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to run in th...

2016-10-13 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15338
  
LGTM. Let's see if @nchammas has some feedback. He had some very helpful 
comments in #3881


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15398: [SPARK-17647][SQL][WIP] Fix backslash escaping in...

2016-10-13 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15398#discussion_r83329826
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 ---
@@ -25,26 +25,25 @@ object StringUtils {
 
   // replace the _ with .{1} exactly match 1 time of any character
   // replace the % with .*, match 0 or more times with any character
-  def escapeLikeRegex(v: String): String = {
-if (!v.isEmpty) {
-  "(?s)" + (' ' +: v.init).zip(v).flatMap {
-case (prev, '\\') => ""
-case ('\\', c) =>
-  c match {
-case '_' => "_"
-case '%' => "%"
-case _ => Pattern.quote("\\" + c)
-  }
-case (prev, c) =>
-  c match {
-case '_' => "."
-case '%' => ".*"
-case _ => Pattern.quote(Character.toString(c))
-  }
-  }.mkString
-} else {
-  v
+  def escapeLikeRegex(str: String): String = {
+val builder = new StringBuilder()
+var escaping = false
+for (next <- str) {
+  if (escaping) {
+builder ++= Pattern.quote(Character.toString(next))
--- End diff --

I think we're misunderstanding each other, there are so many different 
contexts in which a backslash is used :)

The prefix `'\\'` that matches the escape char here is the *scala source* 
escape for the literal character `'\`' (If scala had triple quotes for chars as 
it does for strings, I could have written `'''\'''`). So, in the if statement 
`\\` will consume one character, a `\`.
The subsequent `\` will be quoted in the `escaping` case, and the final `a` 
will also be quoted in the last case, resulting in a `\Q\\E\Qa\E`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15398: [SPARK-17647][SQL][WIP] Fix backslash escaping in 'LIKE'...

2016-10-13 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15398
  
thanks @yhuai for the feedback! I enumerated some issues in the pull 
request description, do you have any input on them? My personal opinion is that 
we should change the regex matching to conform to point 1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to ru...

2016-10-11 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15338#discussion_r82866433
  
--- Diff: sbin/spark-daemon.sh ---
@@ -122,6 +123,35 @@ if [ "$SPARK_NICENESS" = "" ]; then
 export SPARK_NICENESS=0
 fi
 
+execute_command() {
+  command="$@"
+  if [ "$SPARK_NO_DAEMONIZE" != "" ]; then
--- End diff --

this only checks if the variable is non-empty. Setting it to the empty 
string will not work. E.g. `export SPARK_NO_DAEMONIZE=''` will not satisfy the 
condition.

Something like `if [ -z ${SPARK_NO_DAEMONIZE+set} ];` will also handle the 
null-case (see [this SO 
thread](http://stackoverflow.com/questions/3601515/how-to-check-if-a-variable-is-set-in-bash/3601734#3601734)).
 Another, more elegant, solution would be to use the `[[ -v 
SPARK_NO_DAEMONIZE]]`, however that requires at least bash 4.2 and is probably 
not available on all systems that spark runs on.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to ru...

2016-10-11 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15338#discussion_r82874206
  
--- Diff: sbin/spark-daemon.sh ---
@@ -146,13 +176,11 @@ run_command() {
 
   case "$mode" in
 (class)
-  nohup nice -n "$SPARK_NICENESS" "${SPARK_HOME}"/bin/spark-class 
$command "$@" >> "$log" 2>&1 < /dev/null &
-  newpid="$!"
+  execute_command "nice -n \"$SPARK_NICENESS\" 
\"${SPARK_HOME}/bin/spark-class\" $command $@"
   ;;
 
 (submit)
-  nohup nice -n "$SPARK_NICENESS" "${SPARK_HOME}"/bin/spark-submit 
--class $command "$@" >> "$log" 2>&1 < /dev/null &
-  newpid="$!"
+  execute_command "nice -n \"$SPARK_NICENESS\" 
\"${SPARK_HOME}/bin/spark-submit\" --class $command $@"
--- End diff --

same as above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to ru...

2016-10-11 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15338#discussion_r82874084
  
--- Diff: sbin/spark-daemon.sh ---
@@ -146,13 +176,11 @@ run_command() {
 
   case "$mode" in
 (class)
-  nohup nice -n "$SPARK_NICENESS" "${SPARK_HOME}"/bin/spark-class 
$command "$@" >> "$log" 2>&1 < /dev/null &
-  newpid="$!"
+  execute_command "nice -n \"$SPARK_NICENESS\" 
\"${SPARK_HOME}/bin/spark-class\" $command $@"
--- End diff --

the outer-most quotes aren't required without the eval.

`execute_command nice -n "$SPARK_NICENESS" "${SPARK_HOME}/bin/spark-class" 
$command $@` should do the trick


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to ru...

2016-10-11 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15338#discussion_r82875044
  
--- Diff: sbin/spark-daemon.sh ---
@@ -122,6 +123,35 @@ if [ "$SPARK_NICENESS" = "" ]; then
 export SPARK_NICENESS=0
 fi
 
+execute_command() {
+  command="$@"
--- End diff --

spark scripts require bash as interpreter, so we can make this a local 
variable


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to ru...

2016-10-11 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15338#discussion_r82871978
  
--- Diff: sbin/spark-daemon.sh ---
@@ -122,6 +123,35 @@ if [ "$SPARK_NICENESS" = "" ]; then
 export SPARK_NICENESS=0
 fi
 
+execute_command() {
+  command="$@"
+  if [ "$SPARK_NO_DAEMONIZE" != "" ]; then
+  eval $command
--- End diff --

I don't think eval is required here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to ru...

2016-10-11 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15338#discussion_r82872476
  
--- Diff: sbin/spark-daemon.sh ---
@@ -122,6 +123,35 @@ if [ "$SPARK_NICENESS" = "" ]; then
 export SPARK_NICENESS=0
 fi
 
+execute_command() {
+  command="$@"
+  if [ "$SPARK_NO_DAEMONIZE" != "" ]; then
+  eval $command
+  else
+  eval "nohup $command >> \"$log\" 2>&1 < /dev/null &"
--- End diff --

same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-11 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15398#discussion_r82849408
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 ---
@@ -25,26 +25,25 @@ object StringUtils {
 
   // replace the _ with .{1} exactly match 1 time of any character
   // replace the % with .*, match 0 or more times with any character
-  def escapeLikeRegex(v: String): String = {
-if (!v.isEmpty) {
-  "(?s)" + (' ' +: v.init).zip(v).flatMap {
-case (prev, '\\') => ""
-case ('\\', c) =>
-  c match {
-case '_' => "_"
-case '%' => "%"
-case _ => Pattern.quote("\\" + c)
-  }
-case (prev, c) =>
-  c match {
-case '_' => "."
-case '%' => ".*"
-case _ => Pattern.quote(Character.toString(c))
-  }
-  }.mkString
-} else {
-  v
+  def escapeLikeRegex(str: String): String = {
+val builder = new StringBuilder()
+var escaping = false
+for (next <- str) {
+  if (escaping) {
+builder ++= Pattern.quote(Character.toString(next))
--- End diff --

Every character after a backslash is quoted, so `\\a` becomes `\Q\\E\Qa\E`. 
Is this not the intended behaviour?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...

2016-10-10 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15398
  
cc @viirya, I changed your original code


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...

2016-10-07 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15398
  
Every non-java-pattern character is quoted now, updated the StringUtils 
test suite


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-07 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15398#discussion_r82492398
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 ---
@@ -25,26 +25,24 @@ object StringUtils {
 
   // replace the _ with .{1} exactly match 1 time of any character
   // replace the % with .*, match 0 or more times with any character
-  def escapeLikeRegex(v: String): String = {
-if (!v.isEmpty) {
-  "(?s)" + (' ' +: v.init).zip(v).flatMap {
-case (prev, '\\') => ""
-case ('\\', c) =>
-  c match {
-case '_' => "_"
-case '%' => "%"
-case _ => Pattern.quote("\\" + c)
-  }
-case (prev, c) =>
-  c match {
-case '_' => "."
-case '%' => ".*"
-case _ => Pattern.quote(Character.toString(c))
-  }
-  }.mkString
-} else {
-  v
+  def escapeLikeRegex(str: String): String = {
+val builder = new StringBuilder()
+str.foldLeft(false) { case (escaping, next) =>
--- End diff --

updated in latest commit


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...

2016-10-07 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15398
  
I'll add some more tests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-07 Thread jodersky
GitHub user jodersky opened a pull request:

https://github.com/apache/spark/pull/15398

[SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patterns.

## What changes were proposed in this pull request?
Modify SQL-to-Java regex escaping to correctly handle cases of double 
backslashes followed by another special character. This fixes cases such as 
'' not matching '%\\%'.

## How was this patch tested?
Extra case in regex unit tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jodersky/spark SPARK-17647

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15398.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15398


commit 42c27180efc2691d3dd117c429077d888b9fe12d
Author: Jakob Odersky <ja...@odersky.com>
Date:   2016-10-08T00:24:22Z

Fix backslash escaping in 'LIKE' patterns.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

2016-10-03 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/14087#discussion_r81669340
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala 
---
@@ -311,6 +311,37 @@ final class DataStreamReader 
private[sql](sparkSession: SparkSession) extends Lo
   @Experimental
   def text(path: String): DataFrame = format("text").load(path)
 
+  /**
+   * Loads text file(s) and returns a [[Dataset]] of String. The 
underlying schema of the Dataset
--- End diff --

Should text files be plural here? The api would be more intuitive by 
copying the non-streaming equivalent with a vararg-method for multiple 
parameters


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15062: SPARK-17424: Fix unsound substitution bug in ScalaReflec...

2016-09-30 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15062
  
I'm just unclear of why this is needed . According to the linked issue, 
some tests were failing and this fixed them. Everything just seems very vague 
to me. It would be very helpful if you could provide an example (it doesn't 
have to be a unit-test) that reproduces the error.
It would greatly help me understand what's going on and also distill things 
down to a regression test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15257: [SPARK-17683][SQL] Support ArrayType in Literal.a...

2016-09-30 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15257#discussion_r81402565
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
 ---
@@ -46,12 +57,17 @@ object Literal {
 case s: String => Literal(UTF8String.fromString(s), StringType)
 case b: Boolean => Literal(b, BooleanType)
 case d: BigDecimal => Literal(Decimal(d), 
DecimalType(Math.max(d.precision, d.scale), d.scale))
-case d: java.math.BigDecimal =>
+case d: JavaBigDecimal =>
   Literal(Decimal(d), DecimalType(Math.max(d.precision, d.scale), 
d.scale()))
 case d: Decimal => Literal(d, DecimalType(Math.max(d.precision, 
d.scale), d.scale))
 case t: Timestamp => Literal(DateTimeUtils.fromJavaTimestamp(t), 
TimestampType)
 case d: Date => Literal(DateTimeUtils.fromJavaDate(d), DateType)
 case a: Array[Byte] => Literal(a, BinaryType)
+case a: Array[_] =>
--- End diff --

oh, you're right. I didn't realize you specifically wanted an ArrayType


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15257: [SPARK-17683][SQL] Support ArrayType in Literal.a...

2016-09-29 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15257#discussion_r81273658
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
 ---
@@ -46,12 +57,17 @@ object Literal {
 case s: String => Literal(UTF8String.fromString(s), StringType)
 case b: Boolean => Literal(b, BooleanType)
 case d: BigDecimal => Literal(Decimal(d), 
DecimalType(Math.max(d.precision, d.scale), d.scale))
-case d: java.math.BigDecimal =>
+case d: JavaBigDecimal =>
   Literal(Decimal(d), DecimalType(Math.max(d.precision, d.scale), 
d.scale()))
 case d: Decimal => Literal(d, DecimalType(Math.max(d.precision, 
d.scale), d.scale))
 case t: Timestamp => Literal(DateTimeUtils.fromJavaTimestamp(t), 
TimestampType)
 case d: Date => Literal(DateTimeUtils.fromJavaDate(d), DateType)
 case a: Array[Byte] => Literal(a, BinaryType)
+case a: Array[_] =>
--- End diff --

@maropu, I tested my solution in a shell and it worked. To fix the 
compilation error, simply make the overloaded method 
`ScalaReflection.dataTypeFor` public. I don't see any reason for it being 
private in the first place, apart from it just being a default


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15257: [SPARK-17683][SQL] Support ArrayType in Literal.a...

2016-09-29 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15257#discussion_r81257481
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
 ---
@@ -60,6 +76,45 @@ object Literal {
   }
 
   /**
+   * Returns the Spark SQL DataType for a given class object. Since this 
type needs to be resolved
+   * in runtime, we use match-case idioms for class objects here. However, 
there are similar
+   * functions in other files (e.g., HiveInspectors), so these functions 
need to merged into one.
+   */
+  private[this] def componentTypeToDataType(clz: Class[_]): DataType = clz 
match {
+// primitive types
+case c: Class[_] if c == JavaShort.TYPE => ShortType
--- End diff --

cool, I saw you updated the match. However you can remove the instance 
check everywhere, including further down. Basically a `case c: Class[_] ` is 
"equivalent" to an `c.isInstanceOf[Class[_]]`, however that is redundant since 
your parameter `clz` already specifies the type to be `Class[_]`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15257: [SPARK-17683][SQL] Support ArrayType in Literal.a...

2016-09-29 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15257#discussion_r81213697
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
 ---
@@ -60,6 +76,45 @@ object Literal {
   }
 
   /**
+   * Returns the Spark SQL DataType for a given class object. Since this 
type needs to be resolved
+   * in runtime, we use match-case idioms for class objects here. However, 
there are similar
+   * functions in other files (e.g., HiveInspectors), so these functions 
need to merged into one.
+   */
+  private[this] def componentTypeToDataType(clz: Class[_]): DataType = clz 
match {
+// primitive types
+case c: Class[_] if c == JavaShort.TYPE => ShortType
--- End diff --

matching against an instance of `Class[_]` is not necessary. A simpler 
solution is `case JavaShort.TYPE => ShortType`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15257: [SPARK-17683][SQL] Support ArrayType in Literal.a...

2016-09-29 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15257#discussion_r81212761
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
 ---
@@ -46,12 +57,17 @@ object Literal {
 case s: String => Literal(UTF8String.fromString(s), StringType)
 case b: Boolean => Literal(b, BooleanType)
 case d: BigDecimal => Literal(Decimal(d), 
DecimalType(Math.max(d.precision, d.scale), d.scale))
-case d: java.math.BigDecimal =>
+case d: JavaBigDecimal =>
   Literal(Decimal(d), DecimalType(Math.max(d.precision, d.scale), 
d.scale()))
 case d: Decimal => Literal(d, DecimalType(Math.max(d.precision, 
d.scale), d.scale))
 case t: Timestamp => Literal(DateTimeUtils.fromJavaTimestamp(t), 
TimestampType)
 case d: Date => Literal(DateTimeUtils.fromJavaDate(d), DateType)
 case a: Array[Byte] => Literal(a, BinaryType)
+case a: Array[_] =>
--- End diff --

I tried using `ScalaReflection.dataTypeFor` directly (I first had to make 
it public, but I think that's a totally valid option).
Something like 
`ScalaReflection.dataTypeFor(rm.classSymbol(a.getClass).selfType)` works 
perfectly.

I'm not sure why `schemaFor` fails, it may be a bug


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15257: [SPARK-17683][SQL] Support ArrayType in Literal.a...

2016-09-29 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15257#discussion_r81207545
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
 ---
@@ -60,6 +70,45 @@ object Literal {
   }
 
   /**
+   * Returns the Spark SQL DataType for a given class object. Since this 
type needs to be resolved
+   * in runtime, we use match-case idioms for class objects here. However, 
there are similar
+   * functions in other files (e.g., HiveInspectors), so these functions 
need to merged into one.
+   */
+  private[this] def componentTypeToDataType(clz: Class[_]): DataType = clz 
match {
+// primitive types
+case c: Class[_] if c == jShort.TYPE => ShortType
--- End diff --

Here is an example:
```scala
  case object container {def value: Int = 42}
  case object Container {def value: Int = 42}

  def extract(wrapper: Any) = wrapper match {
case container => container.value // error: value "value" is not a 
member of Any
case Container => Container.value // ok
  }
```
It's not a very good example, maybe something like 
http://stackoverflow.com/a/6576663/917519 will give you a better use-case


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15257: [SPARK-17683][SQL] Support ArrayType in Literal.a...

2016-09-28 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15257#discussion_r81033312
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
 ---
@@ -60,6 +70,45 @@ object Literal {
   }
 
   /**
+   * Returns the Spark SQL DataType for a given class object. Since this 
type needs to be resolved
+   * in runtime, we use match-case idioms for class objects here. However, 
there are similar
+   * functions in other files (e.g., HiveInspectors), so these functions 
need to merged into one.
+   */
+  private[this] def componentTypeToDataType(clz: Class[_]): DataType = clz 
match {
+// primitive types
+case c: Class[_] if c == jShort.TYPE => ShortType
--- End diff --

the instance matching is not required here, it should be sufficient to use 
`case jShort.TYPE`.

Btw, this is a nitpick but I would recommend that classes should start with 
a capital letter, i.e. `java.lang.Short` could be imported as `JShort` rather 
that `jShort`. In this case it really doesn't matter, but lower case classes 
can bite you when your working with pattern matching


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15078: [SPARK-17523] Support to generate build info file when b...

2016-09-28 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15078
  
How wide-spread is powershell on windows computers? I assume that modern 
versions ship with it, but I also heard that they come with a bash-like shell 
in the latest version. Does the current build work in the new environment?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15062: SPARK-17424: Fix unsound substitution bug in ScalaReflec...

2016-09-28 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15062
  
Oh, I was too quick to comment, I see a complete description is in the JIRA.
It would still be good if you could add a test though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15062: SPARK-17424: Fix unsound substitution bug in ScalaReflec...

2016-09-28 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15062
  
I don't quite understand what use-case this patch fixes. Can you provide an 
example (in the form of a test) that reproduces the issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15284: [SPARK-17368] [SQL] Add support for value class s...

2016-09-28 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15284#discussion_r81025673
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala
 ---
@@ -66,8 +66,6 @@ case class RepeatedData(
 mapFieldNull: scala.collection.Map[Int, java.lang.Long],
 structField: PrimitiveData)
 
-case class SpecificCollection(l: List[Int])
--- End diff --

cleanup: found this unused class whilst going over the code


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15284: [SPARK-17368] [SQL] Add support for value class s...

2016-09-28 Thread jodersky
GitHub user jodersky opened a pull request:

https://github.com/apache/spark/pull/15284

[SPARK-17368] [SQL] Add support for value class serialization and 
deserialization

## What changes were proposed in this pull request?
Value classes were unsupported because catalyst data types were
obtained through reflection on erased types, which would resolve to a
value class' wrapped type and hence lead to unavailable methods during
code generation.

This patch simply removes the erasure step when getting data types for
catalyst.

## How was this patch tested?
Additional tests in `ExpressionEncoderSuite`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jodersky/spark value-classes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15284.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15284


commit a3dce9f5f9ced492fa6f20b93497856fd51fc757
Author: Jakob Odersky <ja...@odersky.com>
Date:   2016-09-28T21:19:26Z

Add support for value class serialization and deserialization

Value classes were unsupported because catalyst data types were
obtained through reflection on erased types, which would resolve to a
value class' wrapped type and hence lead to unavailable methods during
code generation.

This patch simply removes the erasure step when getting data types for
catalyst.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14691: [SPARK-16407][STREAMING] Allow users to supply custom st...

2016-09-14 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/14691
  
I totally agree! This PR LGTM, but I think a discussion on enabling truly 
pluggable sources through providers will require more discussion on a more 
visible channel. @zsxwing, what do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14691: [SPARK-16407][STREAMING] Allow users to supply custom st...

2016-09-13 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/14691
  
I like the idea! This is might not be the best place to start a discussion, 
but I reckon that the sink provider api could also eventually be used to 
provision builtin sinks. It would make the current, stringly-typed api optional 
and provide more compile-time safety.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15027: [SPARK-17475] [STREAMING] Delete CRC files if the...

2016-09-13 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15027#discussion_r78643449
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
 ---
@@ -146,6 +146,11 @@ class HDFSMetadataLog[T: ClassTag](sparkSession: 
SparkSession, path: String)
   // It will fail if there is an existing file (someone has 
committed the batch)
   logDebug(s"Attempting to write log #${batchIdToPath(batchId)}")
   fileManager.rename(tempPath, batchIdToPath(batchId))
+
+  // SPARK-17475: HDFSMetadataLog should not leak CRC files
+  // If the underlying filesystem didn't rename the CRC file, 
delete it.
--- End diff --

ok, looks good then


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15027: [SPARK-17475] [STREAMING] Delete CRC files if the...

2016-09-12 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15027#discussion_r78469460
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
 ---
@@ -146,6 +146,11 @@ class HDFSMetadataLog[T: ClassTag](sparkSession: 
SparkSession, path: String)
   // It will fail if there is an existing file (someone has 
committed the batch)
   logDebug(s"Attempting to write log #${batchIdToPath(batchId)}")
   fileManager.rename(tempPath, batchIdToPath(batchId))
+
+  // SPARK-17475: HDFSMetadataLog should not leak CRC files
+  // If the underlying filesystem didn't rename the CRC file, 
delete it.
--- End diff --

Is this specific to streaming, or would other parts of spark benefit if 
this behavior were changed in the file manager?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-18 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/14359
  
Some comments still refer to the use of queue and should be updated. Other 
than that, the data structure part now looks good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14359: [SPARK-16719][ML] Random Forests should communica...

2016-08-18 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/14359#discussion_r75361458
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -161,31 +161,43 @@ private[spark] object RandomForest extends Logging {
   None
 }
 
-// FIFO queue of nodes to train: (treeIndex, node)
-val nodeQueue = new mutable.Queue[(Int, LearningNode)]()
+/*
+  FILO queue of nodes to train: (treeIndex, node)
+  We make this FILO by always inserting nodes by appending (+=) and 
removing with dropRight.
--- End diff --

the methods described here still refer to the original queue data structure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-16 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/14359
  
> I switched to Stack and then realized Stack has been deprecated in Scala 
2.11...

I think you probably read the *immutable* stack docs; the *mutable* stack 
is not deprecated AFAIK. I can imagine that having a custom stack 
implementation may allow for additional operations in the future, however we 
should also consider that using standard collections reduces the load for 
anyone who will maintain the code then.

Btw, I highly recommend to use the [milestone 
scaladocs](http://www.scala-lang.org/api/2.12.0-M5/scala/collection/mutable/Stack.html)
 over the current ones. Although 2.12 is not officially out yet, the changes to 
the library are minimal and the UI is much more pleasant to use.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-07-26 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/14359
  
Agree, it's not very obvious. In the latter document I think a `push` is 
akin to `append` and `pop` to `head`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-07-26 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/14359
  
@jkbradley , you can find the scaladoc on stacks here  
http://www.scala-lang.org/api/current/index.html#scala.collection.mutable.Stack

Also this document 
http://docs.scala-lang.org/overviews/collections/performance-characteristics 
gives a nice overview of the different collection types in scala and their 
performances



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14359: [SPARK-16719][ML] Random Forests should communica...

2016-07-26 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/14359#discussion_r72322067
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -1110,4 +1124,40 @@ private[spark] object RandomForest extends Logging {
 }
   }
 
+  /**
+   * Class for queueing nodes to split on each iteration.
+   * This is a FILO queue.
+   */
+  private[impl] class NodeQueue {
+private var q: mutable.MutableList[(Int, LearningNode)] =
+  mutable.MutableList.empty[(Int, LearningNode)]
+
+/** Put (treeIndex, node) into queue. Constant time. */
+def put(treeIndex: Int, node: LearningNode): Unit = {
+  q += ((treeIndex, node))
+}
+
+/** Remove and return last inserted element.  Linear time (unclear in 
Scala docs though) */
--- End diff --

How critical is this for performance? I know that `append` is O(1) for 
scala mutable lists but `dropRight(1)` is actually implemented in a parent 
class [1] of `MutableList` and therefores does not take advantage of the list's 
reference to its last element. All in all, from it's implementation it seems 
that `dropRight` is O(n)

[1] 
https://github.com/scala/scala/blob/v2.11.8/src/library/scala/collection/LinearSeqOptimized.scala#L194-L204


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13729: [SPARK-16008][ML] Remove unnecessary serialization in lo...

2016-07-05 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/13729
  
Hi @dbtsai, I assisted @sethah with some serialization issues during this 
PR. I know we considered using transient but can't recall exactly why we ended 
up not.
My knowledge about the bigger picture of this PR is quite limited, but one 
explanation that comes to mind is that the `coefficients` and `featuresStd` 
parameters are only used within the `add` method. So the reasoning was to keep 
parameters as local as possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13503: [SPARK-15761] [MLlib] [PySpark] Load ipython when defaul...

2016-06-30 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/13503
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13061: [SPARK-14279] [Build] Pick the spark version from pom

2016-06-03 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/13061
  
jenkins, ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13061: [SPARK-14279] [Build] Pick the spark version from pom

2016-06-02 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/13061
  
Last message was a minute too late, so LGTM then.

jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13061: [SPARK-14279] [Build] Pick the spark version from pom

2016-06-02 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/13061
  
In the short-term we should definitely make the REPL welcome message 
consistent. Could you consolidate it with the conflict resolution? I don't 
think there are many changes required. However I'm also good with merging this 
as is now and submitting a separate jira later



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13061: [SPARK-14279] [Build] Pick the spark version from...

2016-06-01 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/13061#discussion_r65465157
  
--- Diff: core/src/main/scala/org/apache/spark/package.scala ---
@@ -41,7 +41,53 @@ package org.apache
  * level interfaces. These are subject to changes or removal in minor 
releases.
  */
 
+import java.util.Properties
+
 package object spark {
-  // For package docs only
-  val SPARK_VERSION = "2.0.0-SNAPSHOT"
+
+  private object SparkBuildInfo {
+val resourceStream = Thread.currentThread().getContextClassLoader.
+  getResourceAsStream("spark-version-info.properties")
+
+val (
+  spark_version: String,
--- End diff --

oh, this is actually local, so no big deal. If you get a chance to fix it 
during conflict resolution it would be great though


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13061: [SPARK-14279] [Build] Pick the spark version from pom

2016-06-01 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/13061
  
Sorry to bother you again and after a so long delay, however I just found 
another minor style bug


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13061: [SPARK-14279] [Build] Pick the spark version from...

2016-06-01 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/13061#discussion_r65464868
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -103,6 +104,9 @@ object SparkSubmit {
/___/ .__/\_,_/_/ /_/\_\   version %s
   /_/
 """.format(SPARK_VERSION))
+printStream.println("Branch %s".format(SPARK_BRANCH))
+printStream.println("Compiled by user %s on 
%s".format(SPARK_BUILD_USER, SPARK_BUILD_DATE))
+printStream.println("Url %s".format(SPARK_REPO_URL))
--- End diff --

It would be great to also add the build info to 
org.apache.spark.repl.SparkILoop


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13061: [SPARK-14279] [Build] Pick the spark version from...

2016-06-01 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/13061#discussion_r65464743
  
--- Diff: core/src/main/scala/org/apache/spark/package.scala ---
@@ -41,7 +41,53 @@ package org.apache
  * level interfaces. These are subject to changes or removal in minor 
releases.
  */
 
+import java.util.Properties
+
 package object spark {
-  // For package docs only
-  val SPARK_VERSION = "2.0.0-SNAPSHOT"
+
+  private object SparkBuildInfo {
+val resourceStream = Thread.currentThread().getContextClassLoader.
+  getResourceAsStream("spark-version-info.properties")
+
+val (
+  spark_version: String,
--- End diff --

is this purposefully using underscores? `sparkVersion` is consistent with 
the general style


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14279] [Build] Pick the spark version f...

2016-05-29 Thread jodersky
Github user jodersky commented on the pull request:

https://github.com/apache/spark/pull/13061#issuecomment-222380888
  
What I understand from the conversation is that maven assumes the 
target/extra-resources directory:
> Unfortunately I'm not an maven expert but I wonder if we can put it in 
like target/extra-resources and have it automatically picked up still. 
http://maven.apache.org/plugins/maven-resources-plugin/examples/copy-resources.html

I'm suggesting to use `resourceManaged` in sbt, since I have had past 
problems with resources not getting picked up automatically when they were not 
in that directory.

It's a minor detail and if everything works for you, than this LGTM too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14279] [Build] Pick the spark version f...

2016-05-27 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/13061#discussion_r64942108
  
--- Diff: project/SparkBuild.scala ---
@@ -436,7 +439,19 @@ object SparkBuild extends PomBuild {
   else x.settings(Seq[Setting[_]](): _*)
 } ++ Seq[Project](OldDeps.project)
   }
+}
 
+object Core {
+  lazy val settings = Seq(
+resourceGenerators in Compile += Def.task {
+  val buildScript = baseDirectory.value + "/../build/spark-build-info"
+  val targetDir = baseDirectory.value + "/target/extra-resources/"
+  val command =  buildScript + " " + targetDir + " " + version.value
+  Process(command).!!
+  val propsFile = baseDirectory.value / "target" / "extra-resources" / 
"spark-version-info.properties"
--- End diff --

Maybe a nit: shouldn't all resources be generated into `resourceManaged`? 
I'm not sure how sbt will handle packaging of non-resourcemanaged artifacts; it 
could be that spark works locally, however errors may occur in a packaged 
environment.

I would suggest renaming `"target"/"extra-resources"` to `(resourceManaged 
in Compile).value`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14279] Spark Version will be picked fro...

2016-05-19 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/12152#discussion_r63970293
  
--- Diff: core/src/main/java/org/apache/spark/VersionInfo.java ---
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.Properties;
+
+public class VersionInfo {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(VersionInfo.class);
+  private Properties info;
+
+  protected VersionInfo(String component) {
+info = new Properties();
+String versionInfoFile = component + "-version-info.properties";
+InputStream is = null;
+try {
+  is = Thread.currentThread().getContextClassLoader()
+  .getResourceAsStream(versionInfoFile);
+  if (is == null) {
+throw new IOException("Resource not found");
+  }
+  info.load(is);
+} catch (IOException ex) {
+} finally {
+  if (is != null) {
+try {
+  is.close();
+} catch (IOException ioex) {
+}
+  }
+}
+  }
+
+  protected String _getVersion() {
+return info.getProperty("version", "Unknown");
+  }
+
+  protected String _getRevision() {
+return info.getProperty("revision", "Unknown");
+  }
+
+  protected String _getBranch() {
+return info.getProperty("branch", "Unknown");
+  }
+
+  protected String _getDate() {
+return info.getProperty("date", "Unknown");
+  }
+
+  protected String _getUser() {
+return info.getProperty("user", "Unknown");
+  }
+
+  protected String _getUrl() {
+return info.getProperty("url", "Unknown");
+  }
+
+  protected String _getSrcChecksum() {
+return info.getProperty("srcChecksum", "Unknown");
+  }
+
+  protected String _getBuildVersion(){
+return getVersion() +
+" from " + _getRevision() +
+" by " + _getUser() +
+" source checksum " + _getSrcChecksum();
+  }
+
+  private static VersionInfo COMMON_VERSION_INFO = new 
VersionInfo("spark");
--- End diff --

This whole class seems pretty complex. Why are the accessors protected but 
exposed through static methods? Would it not be simpler to just have a static 
getter to a single instance and make all member accessors public? In that case 
I would also recommend looking into implementing this as a class/singleton 
object.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14279] Spark Version will be picked fro...

2016-05-19 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/12152#discussion_r63969585
  
--- Diff: core/src/main/java/org/apache/spark/VersionInfo.java ---
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.Properties;
+
+public class VersionInfo {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(VersionInfo.class);
+  private Properties info;
+
+  protected VersionInfo(String component) {
+info = new Properties();
+String versionInfoFile = component + "-version-info.properties";
+InputStream is = null;
+try {
+  is = Thread.currentThread().getContextClassLoader()
+  .getResourceAsStream(versionInfoFile);
+  if (is == null) {
+throw new IOException("Resource not found");
+  }
+  info.load(is);
+} catch (IOException ex) {
+} finally {
+  if (is != null) {
+try {
+  is.close();
+} catch (IOException ioex) {
+}
+  }
+}
+  }
+
+  protected String _getVersion() {
+return info.getProperty("version", "Unknown");
+  }
+
+  protected String _getRevision() {
+return info.getProperty("revision", "Unknown");
+  }
+
+  protected String _getBranch() {
+return info.getProperty("branch", "Unknown");
+  }
+
+  protected String _getDate() {
+return info.getProperty("date", "Unknown");
+  }
+
+  protected String _getUser() {
+return info.getProperty("user", "Unknown");
+  }
+
+  protected String _getUrl() {
+return info.getProperty("url", "Unknown");
+  }
+
+  protected String _getSrcChecksum() {
+return info.getProperty("srcChecksum", "Unknown");
+  }
+
+  protected String _getBuildVersion(){
+return getVersion() +
+" from " + _getRevision() +
+" by " + _getUser() +
+" source checksum " + _getSrcChecksum();
+  }
+
+  private static VersionInfo COMMON_VERSION_INFO = new 
VersionInfo("spark");
+
+  public static String getVersion() {
+return COMMON_VERSION_INFO._getVersion();
+  }
+
+  public static String getRevision() {
+return COMMON_VERSION_INFO._getRevision();
+  }
+
+  public static String getBranch() {
+return COMMON_VERSION_INFO._getBranch();
+  }
+
+  public static String getDate() {
+return COMMON_VERSION_INFO._getDate();
+  }
+
+  public static String getUser() {
+return COMMON_VERSION_INFO._getUser();
+  }
+
+  public static String getUrl() {
+return COMMON_VERSION_INFO._getUrl();
+  }
+
+  public static String getSrcChecksum() {
+return COMMON_VERSION_INFO._getSrcChecksum();
+  }
+
+  public static String getBuildVersion(){
+return COMMON_VERSION_INFO._getBuildVersion();
+  }
+
+  public static void main(String[] args) {
--- End diff --

Is the main method in this utility class ever used?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14279] Spark Version will be picked fro...

2016-05-19 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/12152#discussion_r63969372
  
--- Diff: core/src/main/java/org/apache/spark/VersionInfo.java ---
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.Properties;
+
+public class VersionInfo {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(VersionInfo.class);
--- End diff --

this field is never used


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14279] [Build] Pick the spark version f...

2016-05-19 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/13061#discussion_r6396
  
--- Diff: project/SparkBuild.scala ---
@@ -435,7 +438,19 @@ object SparkBuild extends PomBuild {
   else x.settings(Seq[Setting[_]](): _*)
 } ++ Seq[Project](OldDeps.project)
   }
+}
 
+object Core {
+  lazy val settings = Seq(
+unmanagedResourceDirectories in Compile += baseDirectory.value / 
"target" / "extra-resources",
+resourceGenerators in Compile += Def.task {
+  val buildScript = baseDirectory.value + "/../build/spark-build-info"
+  val targetDir = baseDirectory.value + "/target/extra-resources/"
+  val command =  buildScript + " " + targetDir + " " + version.value
+  Process(Seq("/bin/bash", "-c", command)).!!
--- End diff --

Is the extra unmanagedResourceDirectory required? Alternatively, the 
resource generator could output the extra properties file into the 
`resourceManaged` directory (defaults to "resource_managed"), that way it would 
also automatically be picked up in the classpath.

Btw, I think that generating the file into `resourceManaged` would solve 
the problem with clean you reported on the mailing list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14511][Build] Upgrade genjavadoc to lat...

2016-04-28 Thread jodersky
Github user jodersky commented on the pull request:

https://github.com/apache/spark/pull/12707#issuecomment-215530295
  
@srowen, I did quite some digging through the genjavadoc codebase when 
re-implementing @mengxr's initial fix. I can try to fix the group warnings and 
object privacy issues, however I can't give you a fixed deadline since I won't 
be available until the 10th.
Since the Spark 2.0 freeze is coming up, do you still want to merge this 
(since it supports Scala 2.12)? I'll fix it regardless for a future version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10001][Core] Don't short-circuit action...

2016-04-27 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/12745#discussion_r61370492
  
--- Diff: core/src/main/scala/org/apache/spark/util/SignalUtils.scala ---
@@ -94,7 +94,7 @@ private[spark] object SignalUtils extends Logging {
 
   // run all actions, escalate to parent handler if no action catches 
the signal
   // (i.e. all actions return false)
-  val escalate = actions.asScala.forall { action => !action() }
+  val escalate = actions.asScala.map(action => action()).forall(_ == 
false)
--- End diff --

good idea


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10001][Core] Don't short-circuit action...

2016-04-27 Thread jodersky
Github user jodersky commented on the pull request:

https://github.com/apache/spark/pull/12745#issuecomment-215302197
  
removed the label, sorry about that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10001][Core][Hotfix] Don't short-circui...

2016-04-27 Thread jodersky
Github user jodersky commented on the pull request:

https://github.com/apache/spark/pull/12745#issuecomment-215272244
  
looks like a flaky test
jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14511][Build] Upgrade genjavadoc to lat...

2016-04-27 Thread jodersky
Github user jodersky commented on the pull request:

https://github.com/apache/spark/pull/12707#issuecomment-215258121
  
Hmm, the current genjavadoc also produces errors so I didn't really look 
into it. How does the final output compare? Could this be due to java 8's 
stricter behaviour as @srowen mentioned?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10001][Core][Hotfix] Don't short-circui...

2016-04-27 Thread jodersky
GitHub user jodersky opened a pull request:

https://github.com/apache/spark/pull/12745

[SPARK-10001][Core][Hotfix] Don't short-circuit actions in signal handlers

## What changes were proposed in this pull request?
The current signal handlers have a subtle bug that stops evaluating 
registered actions as soon as one of them returns true, this is because 
`forall` is short-circuited.
This PR adds a strict mapping stage before evaluating returned result.
There are no known occurrences of the bug and this is a preemptive fix.

## How was this patch tested?
As with the original introduction of signal handlers, this was tested 
manually (unit testing with signals is not straightforward).



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jodersky/spark SPARK-10001-hotfix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12745.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12745


commit 7fe0e54c5e39ff942fd58a16d5e5e309887ee883
Author: Jakob Odersky <ja...@odersky.com>
Date:   2016-04-27T22:26:41Z

Evaluate all actions in signal handlers (don't short-circuit)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14661] [MLlib] trim PCAModel by require...

2016-04-27 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/12419#discussion_r61335359
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala 
---
@@ -379,15 +379,21 @@ class RowMatrix @Since("1.0.0") (
*
* Note that this cannot be computed on matrices with more than 65535 
columns.
*
-   * @param k number of top principal components.
+   * @param filter either the number of top principal components or 
variance
+   *   retained by the minimal set of principal components.
* @return a matrix of size n-by-k, whose columns are principal 
components, and
* a vector of values which indicate how much variance each principal 
component
* explains
*/
   @Since("1.6.0")
-  def computePrincipalComponentsAndExplainedVariance(k: Int): (Matrix, 
Vector) = {
+  def computePrincipalComponentsAndExplainedVariance(filter: Either[Int, 
Double])
--- End diff --

Not sure about the breakage, nevertheless I would recommend implementing a 
new method regardless. I find the method's parameter type `Either[Int, Double]` 
to be quite confusing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread jodersky
Github user jodersky commented on the pull request:

https://github.com/apache/spark/pull/12734#issuecomment-215189992
  
Could you change the title to `[SPARK-14954] (current title)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...

2016-04-27 Thread jodersky
Github user jodersky commented on the pull request:

https://github.com/apache/spark/pull/12734#issuecomment-215186982
  
Does this pr fix a ticket? In that case it would be useful to change the 
title to include the [SPARK-] prefix so that the JIRA  status gets updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14661] [MLlib] trim PCAModel by require...

2016-04-27 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/12419#discussion_r61313031
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala 
---
@@ -379,15 +379,21 @@ class RowMatrix @Since("1.0.0") (
*
* Note that this cannot be computed on matrices with more than 65535 
columns.
*
-   * @param k number of top principal components.
+   * @param filter either the number of top principal components or 
variance
+   *   retained by the minimal set of principal components.
* @return a matrix of size n-by-k, whose columns are principal 
components, and
* a vector of values which indicate how much variance each principal 
component
* explains
*/
   @Since("1.6.0")
-  def computePrincipalComponentsAndExplainedVariance(k: Int): (Matrix, 
Vector) = {
+  def computePrincipalComponentsAndExplainedVariance(filter: Either[Int, 
Double])
--- End diff --

I'm no expert in the ML domain, but from a user perspective, this breaks 
API backwards compatibility.
An alternative could be to create a new method and factor out common 
behaviour shared with the current 
`computePrincipalComponentsAndExplainedVariance` into a private utility method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14511][Build] Upgrade genjavadoc to lat...

2016-04-27 Thread jodersky
Github user jodersky commented on the pull request:

https://github.com/apache/spark/pull/12707#issuecomment-215010184
  
The work I submitted upstream doesn't specifically target that issue,
however it is actually plausible that it fixes the private interface issue
as a side effect. I haven't checked it though.

On Wed, Apr 27, 2016 at 1:14 AM, Sean Owen <notificati...@github.com> wrote:

> Seems reasonable to me if the upstream version does all that's needed.
> I wonder if this new version happens to fix the problem in ...
> https://issues.apache.org/jira/browse/SPARK-3359
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/spark/pull/12707#issuecomment-215006856>
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR][MAINTENANCE] Deprecation fixes.

2016-04-26 Thread jodersky
Github user jodersky commented on the pull request:

https://github.com/apache/spark/pull/12653#issuecomment-214857003
  
If I recall correctly, Scala 2.10 will stay for 2.0.0 (probably not for all 
future 2.x releases though, it could be removed as soon as 2.1.0). However, I 
see no problem in reopening the issue when the time comes, you or any committer 
can do that.
I would suggest you rename the pr to include the jira number, something 
like `[SPARK-14417] Deprecation fixes from Scala 2.10`, before you close it. 
That way it will be linked to from the issue tracker for the future.

thanks for working on this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Upgrade genjavadoc and use upstream version

2016-04-26 Thread jodersky
Github user jodersky commented on the pull request:

https://github.com/apache/spark/pull/12707#issuecomment-214849927
  
cc @mengxr 
@JoshRosen, is the maven build ok? I could only find references to 
genjavadoc in sbt.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Upgrade genjavadoc and use upstream version

2016-04-26 Thread jodersky
GitHub user jodersky opened a pull request:

https://github.com/apache/spark/pull/12707

Upgrade genjavadoc and use upstream version

## What changes were proposed in this pull request?
In the past, genjavadoc had issues with package private members which led 
the spark project to use a forked version. This issue has been fixed upstream 
(typesafehub/genjavadoc#70) and a release is available for scala versions 2.10, 
2.11 **and 2.12**, hence a forked version for spark is no longer necessary.
This pull request updates the build configuration to use the newest 
upstream genjavadoc.

## How was this patch tested?
The build was run `sbt unidoc`. During the process javadoc emits some 
errors on the generates java stubs, however these errors were also present 
before the upgrade. Furthermore, the produces html is fine.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jodersky/spark SPARK-14511-genjavadoc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12707.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12707


commit f7d649c47d0d6f082b5a6237e4b60e8bd8cf3716
Author: Jakob Odersky <ja...@odersky.com>
Date:   2016-04-25T22:58:40Z

Upgrade genjavadoc and use upstream version




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14347] [WIP] Require Java 8 for Spark 2...

2016-04-22 Thread jodersky
Github user jodersky commented on the pull request:

https://github.com/apache/spark/pull/12165#issuecomment-213328740
  
@srowen, @rxin considering the discussions around the matter, should this 
pr be closed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Do not allow alter column commands.

2016-04-22 Thread jodersky
Github user jodersky commented on the pull request:

https://github.com/apache/spark/pull/12332#issuecomment-213325682
  
@yhuai, is this pr still actively used for testing purposes? would you mind 
closing it if not, thx


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: merge

2016-04-22 Thread jodersky
Github user jodersky commented on the pull request:

https://github.com/apache/spark/pull/12442#issuecomment-213320568
  
it looks like this was made by mistake. @Sunshineit, @srowen (or anyone 
with the commit powers), could you please close this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >