[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15398 Glad to see this being merged. Let me know if there is anything more I can do to help with any issues. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17230: [SPARK-19353][CORE] Generalize PipedRDD to use I/...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/17230#discussion_r108307572 --- Diff: core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala --- @@ -198,17 +183,114 @@ private[spark] class PipedRDD[T: ClassTag]( val t = childThreadException.get() if (t != null) { val commandRan = command.mkString(" ") - logError(s"Caught exception while running pipe() operator. Command ran: $commandRan. " + -s"Exception: ${t.getMessage}") - proc.destroy() + logError("Caught exception while running pipe() operator. " + + s"Command ran: $commandRan.", t) cleanup() + proc.destroy() throw t } } } } } +/** Specifies how to write the elements of the input [[RDD]] into the pipe. */ +trait InputWriter[T] extends Serializable { + def write(dos: DataOutput, elem: T): Unit +} + +/** Specifies how to read the elements from the pipe into the output [[RDD]]. */ +trait OutputReader[T] extends Serializable { + /** + * Reads the next element. + * + * The input is guaranteed to have at least one byte. + */ + def read(dis: DataInput): T +} + +class TextInputWriter[I]( +encoding: String = Codec.defaultCharsetCodec.name, +printPipeContext: (String => Unit) => Unit = null, +printRDDElement: (I, String => Unit) => Unit = null +) extends InputWriter[I] { + + private[this] val lineSeparator = System.lineSeparator().getBytes(encoding) + private[this] var initialized = printPipeContext == null + + private def writeLine(dos: DataOutput, s: String): Unit = { +dos.write(s.getBytes(encoding)) +dos.write(lineSeparator) + } + + override def write(dos: DataOutput, elem: I): Unit = { +if (!initialized) { + printPipeContext(writeLine(dos, _)) + initialized = true +} + +if (printRDDElement == null) { + writeLine(dos, String.valueOf(elem)) +} else { + printRDDElement(elem, writeLine(dos, _)) +} + } +} + +class TextOutputReader( +encoding: String = Codec.defaultCharsetCodec.name +) extends OutputReader[String] { + + private[this] val lf = "\n".getBytes(encoding) + private[this] val cr = "\r".getBytes(encoding) + private[this] val crlf = cr ++ lf + private[this] var buf = Array.ofDim[Byte](64) + private[this] var used = 0 + + @inline + /** Checks that the suffix of [[buf]] matches [[other]]. */ + private def endsWith(other: Array[Byte]): Boolean = { +var i = used - 1 +var j = other.length - 1 +(j <= i) && { + while (j >= 0) { +if (buf(i) != other(j)) { + return false +} +i -= 1 +j -= 1 + } + true +} + } + + override def read(dis: DataInput): String = { --- End diff -- Could a `dis.readLine()` be used here and would it be more efficient? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16247: [SPARK-18817][SparkR] set default spark-warehouse path t...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/16247 jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16210: [Core][SPARK-18778]Fix the scala classpath under some en...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/16210 +1 to implementing it in the command builder. However I'm not sure if the command option `-usejavacp` or property `scala.usejavacp` is the suggested way to add the java class path to the REPL. I skimmed through the scala compiler's code and it seems both do slightly different things. I'll check this out in further detail --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16210: [Core][SPARK-18778]Fix the scala classpath under some en...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/16210 The command line parameter `-Dscala.usejavacp` is something directly passed to the JVM and should not be OS-dependent. I'm wondering if the issue may not arise from the way the parameter is defined in the spark-shell launcher scripts. Debian Etch is very old and probably has a older version of bash. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15398 @hvanhovell, I just wanted to add that option 2 from my last comment won't work: if we replace `\\` with `` in the unescape function (as is done for `\%` and `\_`), it will be impossible for a user to specify a single backslash. Considering that, another potential solution would be change the ANTLR parser to handle escaping differently in LIKE expressions. This would however introduce a special case which complicates the overall rules. Maybe leaving the current behaviour but documenting that backslashes are escaped on the parser-level is the best option? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15398 @gatorsmile, I updated the PR according to your comments. Whilst adding the example test ```scala checkEvaluation("""%SystemDrive%\Users\John""" like """\%SystemDrive\%\\Users%""", true) ``` I noticed that the equivalent SQL string: ```scala spark.sql("""'%SystemDrive%\Users\John like '\%SystemDrive\%\\Users%'""") ``` will throw an exception: `org.apache.spark.sql.AnalysisException: the pattern '\%SystemDrive\%\Users%' is invalid, the escape character is not allowed to precede 'U';` (Note that a backslash is missing before 'Users') I traced down the issue to the ANTLR4 parser, that will call [unescapeSQLString](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala#L100) on the string passed to `spark.sql`, replacing any escaped character except for `\_` and `\%`, and thereby removing one backslash before "Users". To solve the issue there are two potential solutions, both flawed in my opinion: 1. Escape the backlash twice, requiring 4 backslashes in total. `spark.sql("""'%SystemDrive%\Users\John like '\%SystemDrive\%\\Users%'""")` This is inconsistent as escaping the percent requires only one backslash 2. Add a special rule to the parser that ignores double slashes. This won't place nice with custom escape characters in the future. My question is more fundamental, why are SQL strings escaped in the first place? Should it not be up to the frontend language to escape such strings? I.e. scala/java/python should handle replacing things such as `\n` with the newline or replacing unicode sequences with their corresponding symbols. Getting rid of the extra unescaping would be the cleanest solution to avoid double-escaping in my opinion --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15513: [SPARK-17963][SQL][Documentation] Add examples (e...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15513#discussion_r84591482 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala --- @@ -125,7 +129,7 @@ case class DescribeFunctionCommand( if (isExtended) { result :+ - Row(s"Extended Usage:\n${replaceFunctionName(info.getExtended, info.getName)}") + Row(s"Extended Usage:${replaceFunctionName(info.getExtended, info.getName)}") --- End diff -- Indeed, annotations require constant parameters (probably due to JVM requirements). Since `stripMargin` is a method on a string wrapper, it unfortunately cannot be used as an annotation argument --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15513 @HyukjinKwon (posting here to sum up the inline discussion) I looked through the way the final description string is created before printing, and it looks like the formatting will actually play nicely with extra indentation resulting from the use of triple quotes without margin stripping. So in the end we're just debating a matter of style here, in which case I gladly accept your choice. Thanks for the work, ð to the formatting --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15398 @gatorsmile thx for the review. I'll update the test suite as we discussed initially :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15398#discussion_r84582746 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -68,7 +68,20 @@ trait StringRegexExpression extends ImplicitCastInputTypes { * Simple RegEx pattern matching function */ @ExpressionDescription( - usage = "str _FUNC_ pattern - Returns true if str matches pattern and false otherwise.") + usage = "str _FUNC_ pattern - Returns true if str matches pattern, " + +"null if any arguments are null, false otherwise.", + extended = +"The pattern is a string which is matched literally, with exception to the following " + +"special symbols:\n\n" + +"_ matches any one character in the input (similar to . in posix " + +"regular expressions)\n\n" + +"% matches zero ore more characters in the input (similar to .* in " + +"posix regular expressions)\n\n" + +"The escape character is '\\'. If an escape character precedes a special symbol or " + +"another escape character, the following character is matched literally, For example, " + +"the expression ` like \\%SystemDrive\\%Users%` will match any `` that " + +"starts with '%SystemDrive%\\Users'. It is invalid to escape any other character.\n\n" + +"Use RLIKE to match with standard regular expressions.") --- End diff -- ack --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15398#discussion_r84582662 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala --- @@ -19,32 +19,34 @@ package org.apache.spark.sql.catalyst.util import java.util.regex.{Pattern, PatternSyntaxException} +import org.apache.spark.sql.AnalysisException import org.apache.spark.unsafe.types.UTF8String object StringUtils { - // replace the _ with .{1} exactly match 1 time of any character - // replace the % with .*, match 0 or more times with any character - def escapeLikeRegex(v: String): String = { -if (!v.isEmpty) { - "(?s)" + (' ' +: v.init).zip(v).flatMap { -case (prev, '\\') => "" -case ('\\', c) => - c match { -case '_' => "_" -case '%' => "%" -case _ => Pattern.quote("\\" + c) - } -case (prev, c) => + /** Convert 'like' pattern to Java regex. */ + def escapeLikeRegex(str: String): String = { +val in = str.toIterator +val out = new StringBuilder() + +def fail(message: String) = throw new AnalysisException( + s"the pattern '$str' is invalid, $message") --- End diff -- > org.apache.spark.sql.AnalysisException: the pattern '\a' is invalid, the escape character is not allowed to precede 'a'; or > org.apache.spark.sql.AnalysisException: The pattern '\a' is invalid. Reason: the escape character is not allowed to precede 'a'; Isn't this just a matter of preference? I preferred the first one because it avoids using sentences that are expected to end in a period, when a semi colon is always appended after an analysis exception. It's a really really superficial thing though, so I'm happy to change it either way --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15513 I agree with that the extended part is usually multiple lines, my observation is that an newline will be inserted *before* any text, since the string starts the line following `extended="""`. Just wanted to make sure this is intended. Regrading the `stripMargin`: unfortunately `stripMargin` is a method on a string wrapper, hence is not a constant and cannot be used in annotation arguments. I ran into this myself in #15398. A workaround I could imagine is to call `stripMargin` here https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala#L498 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15513 More documentation is always great, thanks! Are the newlines at the beginning of each expression intentional? E.g. ``` extended = """ Arguments: class - a string literal ... ``` will have a newline before "Arguments". Also, keep in mind that the indenting will be treated verbatim in triple quotes. Would it make sense to call a `.stripMargin` before printing the extended usage string? That way it would be possible to use pipes (|) to have a nice format in source code as well as when it is printed to the user. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15398 I went ahead and changed the parsing function to throw an AnalysisException in case the input escaper sequence is invalid. Feel free to revert (or just tell me and I'll do it) if we should stick to Hive's behaviour Also, I hope an `AnalysisException` is the correct way to report an error here, let me know if it should be something else. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15398 FWIW, I just read this about the standard > In an ANSI-compliant database, you can only use the LIKE escape character to escape a percent sign ( % ), an underscore ( _ ), or the escape character itself. [source](https://www.ibm.com/support/knowledgecenter/en/SSGU8G_12.1.0/com.ibm.sqls.doc/ids_sqs_1388.htm). I think we should reconsider the need to follow Hive here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15398#discussion_r84143867 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -68,7 +68,20 @@ trait StringRegexExpression extends ImplicitCastInputTypes { * Simple RegEx pattern matching function */ @ExpressionDescription( - usage = "str _FUNC_ pattern - Returns true if str matches pattern and false otherwise.") + usage = "str _FUNC_ pattern - Returns true if str matches pattern, " + +"null if any arguments are null, false otherwise.", + extended = +"The pattern is a string which is matched literally, with exception to the " + +"following special symbols:\n\n" + +"_ matches any one character in the input (similar to . in posix " + +"regular expressions)\n\n" + +"% matches zero ore more characters in the input (similar to .* in " + +"posix regular expressions\n\n" + +"The default escape character is '\\'. If an escape character precedes a special symbol or " + +"another escape character, the following character is matched literally, otherwise the " + + "escape character is treated literally. I.e. '\\%' would match '%', whereas '\\a' matches " + + "'\\a'.\n\n" + --- End diff -- good idea --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15398 I understand your comment about the weird escaping behaviour, @mengxr. Putting myself into the shoes of a new user, I would be least surprised if Spark were to treat the String verbatim (as in "anything after a backslash is treated literally") or give me an error and tell me that my escape sequence doesn't make sense. Changing the parsing of the escape character based on what follows also makes the code more complicated. However, I also understand the high priority of making migration from Hive to Spark SQL most straightforward. Therefore, I concluded that existing queries should behave the same, and reimplemented Hive's 'like' pattern matching. Maybe this wasn't the right trade-off for non Hive users? I was unable to find any docs that detail the escaping behaviour in the scenarios mentioned above, however I basically followed the implementation thanks to the [link Simon provided](https://github.com/apache/hive/blob/ff67cdda1c538dc65087878eeba3e165cf3230f4/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLike.java#L64). @rxin, do you think this case warrants diverging from Hive? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to run in th...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15338 no objections from me ;) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15398: [SPARK-17647][SQL][WIP] Fix backslash escaping in 'LIKE'...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15398 Agreed, I'll update the regex parsing to copy Hive's behaviour --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15520: [SPARK-13747][SQL]Fix concurrent executions in ForkJoinP...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15520 > Yes, I think so. It just needs more lines :) I think the solution you propose is sound. Do you think that a quick mention about avoiding the `BlockingContext` in a comment would help in case someone sees this code in the future and is puzzled as to what it does? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15520: [SPARK-13747][SQL]Fix concurrent executions in ForkJoinP...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15520 That makes sense, thanks for explaining. Would installing a custom `BlockingContext` on threads that run tasks also work? I'm not very familiar with the internals of the Awaitable api so I'm just throwing this out there in the hopes you can tell me more :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15520: [SPARK-13747][SQL]Fix concurrent executions in ForkJoinP...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15520 I'm not sure I completely understand why you have to use `Awaitable` instead of `Await`. Is it to avoid the [`blocking()` call in `Await`](https://github.com/scala/scala/blob/v2.12.0-RC1/src/library/scala/concurrent/package.scala#L168)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15398: [SPARK-17647][SQL][WIP] Fix backslash escaping in...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15398#discussion_r83520908 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala --- @@ -74,6 +107,31 @@ class RegexpExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkEvaluation("a\nb" like regEx, true, create_row("a%b")) checkEvaluation(Literal.create(null, StringType) like regEx, null, create_row("bc%")) + +checkEvaluation("" like regEx, true, create_row("")) +checkEvaluation("a" like regEx, false, create_row("")) +checkEvaluation("" like regEx, false, create_row("a")) + +checkEvaluation("""""" like regEx, true, create_row("""%\\%""")) +checkEvaluation("""%%""" like regEx, true, create_row("""%%""")) +checkEvaluation("""\__""" like regEx, true, create_row("""\\\__""")) +checkEvaluation("""\\\__""" like regEx, false, create_row("""%\\%\%""")) +checkEvaluation("""_\\\%""" like regEx, false, create_row("""%\\""")) + +// scalastyle:off nonascii +checkEvaluation("a\u20ACa" like regEx, true, create_row("_\u20AC_")) +checkEvaluation("aâ¬a" like regEx, true, create_row("_â¬_")) +checkEvaluation("aâ¬a" like regEx, true, create_row("_\u20AC_")) +checkEvaluation("a\u20ACa" like regEx, true, create_row("_â¬_")) +// scalastyle:on nonascii + +// TODO: should throw an exception? --- End diff -- > The escape character must be a single character no more than two bytes in length. It can only appear in the pattern string if it is followed by itself, a percent sign, or an underscore. does this mean that PostgreSQL's behaviour is wrong? Should spark throw an exception in point 2? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15398: [SPARK-17647][SQL][WIP] Fix backslash escaping in 'LIKE'...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15398 Thanks for the feedback. It was my plan to add the escape syntax in a separate PR. The main questions that I'm concerned with here are the points I enumerate in this PRs description, namely: - should spark allow ending patterns in the escape character? - how to handle empty input and/or patterns? - escaping "regular" characters? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to run in th...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15338 You raise a good point about having to commit to support environment variables in future versions. A command-line option would be preferable --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15398: [SPARK-17647][SQL][WIP] Fix backslash escaping in 'LIKE'...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15398 The style issues are weird. Running scalastyle in sbt worked fine... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15284: [SPARK-17368] [SQL] Add support for value class serializ...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15284 cc @marmbrus --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15284: [SPARK-17368] [SQL] Add support for value class s...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15284#discussion_r83334074 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -628,7 +628,7 @@ object ScalaReflection extends ScalaReflection { /* * Retrieves the runtime class corresponding to the provided type. */ - def getClassFromType(tpe: Type): Class[_] = mirror.runtimeClass(tpe.erasure.typeSymbol.asClass) + def getClassFromType(tpe: Type): Class[_] = mirror.runtimeClass(tpe.typeSymbol.asClass) --- End diff -- this is the main change --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to run in th...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15338 LGTM. Let's see if @nchammas has some feedback. He had some very helpful comments in #3881 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15398: [SPARK-17647][SQL][WIP] Fix backslash escaping in...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15398#discussion_r83329826 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala --- @@ -25,26 +25,25 @@ object StringUtils { // replace the _ with .{1} exactly match 1 time of any character // replace the % with .*, match 0 or more times with any character - def escapeLikeRegex(v: String): String = { -if (!v.isEmpty) { - "(?s)" + (' ' +: v.init).zip(v).flatMap { -case (prev, '\\') => "" -case ('\\', c) => - c match { -case '_' => "_" -case '%' => "%" -case _ => Pattern.quote("\\" + c) - } -case (prev, c) => - c match { -case '_' => "." -case '%' => ".*" -case _ => Pattern.quote(Character.toString(c)) - } - }.mkString -} else { - v + def escapeLikeRegex(str: String): String = { +val builder = new StringBuilder() +var escaping = false +for (next <- str) { + if (escaping) { +builder ++= Pattern.quote(Character.toString(next)) --- End diff -- I think we're misunderstanding each other, there are so many different contexts in which a backslash is used :) The prefix `'\\'` that matches the escape char here is the *scala source* escape for the literal character `'\`' (If scala had triple quotes for chars as it does for strings, I could have written `'''\'''`). So, in the if statement `\\` will consume one character, a `\`. The subsequent `\` will be quoted in the `escaping` case, and the final `a` will also be quoted in the last case, resulting in a `\Q\\E\Qa\E`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15398: [SPARK-17647][SQL][WIP] Fix backslash escaping in 'LIKE'...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15398 thanks @yhuai for the feedback! I enumerated some issues in the pull request description, do you have any input on them? My personal opinion is that we should change the regex matching to conform to point 1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to ru...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15338#discussion_r82866433 --- Diff: sbin/spark-daemon.sh --- @@ -122,6 +123,35 @@ if [ "$SPARK_NICENESS" = "" ]; then export SPARK_NICENESS=0 fi +execute_command() { + command="$@" + if [ "$SPARK_NO_DAEMONIZE" != "" ]; then --- End diff -- this only checks if the variable is non-empty. Setting it to the empty string will not work. E.g. `export SPARK_NO_DAEMONIZE=''` will not satisfy the condition. Something like `if [ -z ${SPARK_NO_DAEMONIZE+set} ];` will also handle the null-case (see [this SO thread](http://stackoverflow.com/questions/3601515/how-to-check-if-a-variable-is-set-in-bash/3601734#3601734)). Another, more elegant, solution would be to use the `[[ -v SPARK_NO_DAEMONIZE]]`, however that requires at least bash 4.2 and is probably not available on all systems that spark runs on. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to ru...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15338#discussion_r82874206 --- Diff: sbin/spark-daemon.sh --- @@ -146,13 +176,11 @@ run_command() { case "$mode" in (class) - nohup nice -n "$SPARK_NICENESS" "${SPARK_HOME}"/bin/spark-class $command "$@" >> "$log" 2>&1 < /dev/null & - newpid="$!" + execute_command "nice -n \"$SPARK_NICENESS\" \"${SPARK_HOME}/bin/spark-class\" $command $@" ;; (submit) - nohup nice -n "$SPARK_NICENESS" "${SPARK_HOME}"/bin/spark-submit --class $command "$@" >> "$log" 2>&1 < /dev/null & - newpid="$!" + execute_command "nice -n \"$SPARK_NICENESS\" \"${SPARK_HOME}/bin/spark-submit\" --class $command $@" --- End diff -- same as above --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to ru...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15338#discussion_r82874084 --- Diff: sbin/spark-daemon.sh --- @@ -146,13 +176,11 @@ run_command() { case "$mode" in (class) - nohup nice -n "$SPARK_NICENESS" "${SPARK_HOME}"/bin/spark-class $command "$@" >> "$log" 2>&1 < /dev/null & - newpid="$!" + execute_command "nice -n \"$SPARK_NICENESS\" \"${SPARK_HOME}/bin/spark-class\" $command $@" --- End diff -- the outer-most quotes aren't required without the eval. `execute_command nice -n "$SPARK_NICENESS" "${SPARK_HOME}/bin/spark-class" $command $@` should do the trick --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to ru...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15338#discussion_r82875044 --- Diff: sbin/spark-daemon.sh --- @@ -122,6 +123,35 @@ if [ "$SPARK_NICENESS" = "" ]; then export SPARK_NICENESS=0 fi +execute_command() { + command="$@" --- End diff -- spark scripts require bash as interpreter, so we can make this a local variable --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to ru...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15338#discussion_r82871978 --- Diff: sbin/spark-daemon.sh --- @@ -122,6 +123,35 @@ if [ "$SPARK_NICENESS" = "" ]; then export SPARK_NICENESS=0 fi +execute_command() { + command="$@" + if [ "$SPARK_NO_DAEMONIZE" != "" ]; then + eval $command --- End diff -- I don't think eval is required here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to ru...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15338#discussion_r82872476 --- Diff: sbin/spark-daemon.sh --- @@ -122,6 +123,35 @@ if [ "$SPARK_NICENESS" = "" ]; then export SPARK_NICENESS=0 fi +execute_command() { + command="$@" + if [ "$SPARK_NO_DAEMONIZE" != "" ]; then + eval $command + else + eval "nohup $command >> \"$log\" 2>&1 < /dev/null &" --- End diff -- same here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15398#discussion_r82849408 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala --- @@ -25,26 +25,25 @@ object StringUtils { // replace the _ with .{1} exactly match 1 time of any character // replace the % with .*, match 0 or more times with any character - def escapeLikeRegex(v: String): String = { -if (!v.isEmpty) { - "(?s)" + (' ' +: v.init).zip(v).flatMap { -case (prev, '\\') => "" -case ('\\', c) => - c match { -case '_' => "_" -case '%' => "%" -case _ => Pattern.quote("\\" + c) - } -case (prev, c) => - c match { -case '_' => "." -case '%' => ".*" -case _ => Pattern.quote(Character.toString(c)) - } - }.mkString -} else { - v + def escapeLikeRegex(str: String): String = { +val builder = new StringBuilder() +var escaping = false +for (next <- str) { + if (escaping) { +builder ++= Pattern.quote(Character.toString(next)) --- End diff -- Every character after a backslash is quoted, so `\\a` becomes `\Q\\E\Qa\E`. Is this not the intended behaviour? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15398 cc @viirya, I changed your original code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15398 Every non-java-pattern character is quoted now, updated the StringUtils test suite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15398#discussion_r82492398 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala --- @@ -25,26 +25,24 @@ object StringUtils { // replace the _ with .{1} exactly match 1 time of any character // replace the % with .*, match 0 or more times with any character - def escapeLikeRegex(v: String): String = { -if (!v.isEmpty) { - "(?s)" + (' ' +: v.init).zip(v).flatMap { -case (prev, '\\') => "" -case ('\\', c) => - c match { -case '_' => "_" -case '%' => "%" -case _ => Pattern.quote("\\" + c) - } -case (prev, c) => - c match { -case '_' => "." -case '%' => ".*" -case _ => Pattern.quote(Character.toString(c)) - } - }.mkString -} else { - v + def escapeLikeRegex(str: String): String = { +val builder = new StringBuilder() +str.foldLeft(false) { case (escaping, next) => --- End diff -- updated in latest commit --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15398 I'll add some more tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...
GitHub user jodersky opened a pull request: https://github.com/apache/spark/pull/15398 [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patterns. ## What changes were proposed in this pull request? Modify SQL-to-Java regex escaping to correctly handle cases of double backslashes followed by another special character. This fixes cases such as '' not matching '%\\%'. ## How was this patch tested? Extra case in regex unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jodersky/spark SPARK-17647 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15398.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15398 commit 42c27180efc2691d3dd117c429077d888b9fe12d Author: Jakob Odersky <ja...@odersky.com> Date: 2016-10-08T00:24:22Z Fix backslash escaping in 'LIKE' patterns. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/14087#discussion_r81669340 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala --- @@ -311,6 +311,37 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo @Experimental def text(path: String): DataFrame = format("text").load(path) + /** + * Loads text file(s) and returns a [[Dataset]] of String. The underlying schema of the Dataset --- End diff -- Should text files be plural here? The api would be more intuitive by copying the non-streaming equivalent with a vararg-method for multiple parameters --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15062: SPARK-17424: Fix unsound substitution bug in ScalaReflec...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15062 I'm just unclear of why this is needed . According to the linked issue, some tests were failing and this fixed them. Everything just seems very vague to me. It would be very helpful if you could provide an example (it doesn't have to be a unit-test) that reproduces the error. It would greatly help me understand what's going on and also distill things down to a regression test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15257: [SPARK-17683][SQL] Support ArrayType in Literal.a...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15257#discussion_r81402565 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala --- @@ -46,12 +57,17 @@ object Literal { case s: String => Literal(UTF8String.fromString(s), StringType) case b: Boolean => Literal(b, BooleanType) case d: BigDecimal => Literal(Decimal(d), DecimalType(Math.max(d.precision, d.scale), d.scale)) -case d: java.math.BigDecimal => +case d: JavaBigDecimal => Literal(Decimal(d), DecimalType(Math.max(d.precision, d.scale), d.scale())) case d: Decimal => Literal(d, DecimalType(Math.max(d.precision, d.scale), d.scale)) case t: Timestamp => Literal(DateTimeUtils.fromJavaTimestamp(t), TimestampType) case d: Date => Literal(DateTimeUtils.fromJavaDate(d), DateType) case a: Array[Byte] => Literal(a, BinaryType) +case a: Array[_] => --- End diff -- oh, you're right. I didn't realize you specifically wanted an ArrayType --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15257: [SPARK-17683][SQL] Support ArrayType in Literal.a...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15257#discussion_r81273658 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala --- @@ -46,12 +57,17 @@ object Literal { case s: String => Literal(UTF8String.fromString(s), StringType) case b: Boolean => Literal(b, BooleanType) case d: BigDecimal => Literal(Decimal(d), DecimalType(Math.max(d.precision, d.scale), d.scale)) -case d: java.math.BigDecimal => +case d: JavaBigDecimal => Literal(Decimal(d), DecimalType(Math.max(d.precision, d.scale), d.scale())) case d: Decimal => Literal(d, DecimalType(Math.max(d.precision, d.scale), d.scale)) case t: Timestamp => Literal(DateTimeUtils.fromJavaTimestamp(t), TimestampType) case d: Date => Literal(DateTimeUtils.fromJavaDate(d), DateType) case a: Array[Byte] => Literal(a, BinaryType) +case a: Array[_] => --- End diff -- @maropu, I tested my solution in a shell and it worked. To fix the compilation error, simply make the overloaded method `ScalaReflection.dataTypeFor` public. I don't see any reason for it being private in the first place, apart from it just being a default --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15257: [SPARK-17683][SQL] Support ArrayType in Literal.a...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15257#discussion_r81257481 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala --- @@ -60,6 +76,45 @@ object Literal { } /** + * Returns the Spark SQL DataType for a given class object. Since this type needs to be resolved + * in runtime, we use match-case idioms for class objects here. However, there are similar + * functions in other files (e.g., HiveInspectors), so these functions need to merged into one. + */ + private[this] def componentTypeToDataType(clz: Class[_]): DataType = clz match { +// primitive types +case c: Class[_] if c == JavaShort.TYPE => ShortType --- End diff -- cool, I saw you updated the match. However you can remove the instance check everywhere, including further down. Basically a `case c: Class[_] ` is "equivalent" to an `c.isInstanceOf[Class[_]]`, however that is redundant since your parameter `clz` already specifies the type to be `Class[_]` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15257: [SPARK-17683][SQL] Support ArrayType in Literal.a...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15257#discussion_r81213697 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala --- @@ -60,6 +76,45 @@ object Literal { } /** + * Returns the Spark SQL DataType for a given class object. Since this type needs to be resolved + * in runtime, we use match-case idioms for class objects here. However, there are similar + * functions in other files (e.g., HiveInspectors), so these functions need to merged into one. + */ + private[this] def componentTypeToDataType(clz: Class[_]): DataType = clz match { +// primitive types +case c: Class[_] if c == JavaShort.TYPE => ShortType --- End diff -- matching against an instance of `Class[_]` is not necessary. A simpler solution is `case JavaShort.TYPE => ShortType` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15257: [SPARK-17683][SQL] Support ArrayType in Literal.a...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15257#discussion_r81212761 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala --- @@ -46,12 +57,17 @@ object Literal { case s: String => Literal(UTF8String.fromString(s), StringType) case b: Boolean => Literal(b, BooleanType) case d: BigDecimal => Literal(Decimal(d), DecimalType(Math.max(d.precision, d.scale), d.scale)) -case d: java.math.BigDecimal => +case d: JavaBigDecimal => Literal(Decimal(d), DecimalType(Math.max(d.precision, d.scale), d.scale())) case d: Decimal => Literal(d, DecimalType(Math.max(d.precision, d.scale), d.scale)) case t: Timestamp => Literal(DateTimeUtils.fromJavaTimestamp(t), TimestampType) case d: Date => Literal(DateTimeUtils.fromJavaDate(d), DateType) case a: Array[Byte] => Literal(a, BinaryType) +case a: Array[_] => --- End diff -- I tried using `ScalaReflection.dataTypeFor` directly (I first had to make it public, but I think that's a totally valid option). Something like `ScalaReflection.dataTypeFor(rm.classSymbol(a.getClass).selfType)` works perfectly. I'm not sure why `schemaFor` fails, it may be a bug --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15257: [SPARK-17683][SQL] Support ArrayType in Literal.a...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15257#discussion_r81207545 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala --- @@ -60,6 +70,45 @@ object Literal { } /** + * Returns the Spark SQL DataType for a given class object. Since this type needs to be resolved + * in runtime, we use match-case idioms for class objects here. However, there are similar + * functions in other files (e.g., HiveInspectors), so these functions need to merged into one. + */ + private[this] def componentTypeToDataType(clz: Class[_]): DataType = clz match { +// primitive types +case c: Class[_] if c == jShort.TYPE => ShortType --- End diff -- Here is an example: ```scala case object container {def value: Int = 42} case object Container {def value: Int = 42} def extract(wrapper: Any) = wrapper match { case container => container.value // error: value "value" is not a member of Any case Container => Container.value // ok } ``` It's not a very good example, maybe something like http://stackoverflow.com/a/6576663/917519 will give you a better use-case --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15257: [SPARK-17683][SQL] Support ArrayType in Literal.a...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15257#discussion_r81033312 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala --- @@ -60,6 +70,45 @@ object Literal { } /** + * Returns the Spark SQL DataType for a given class object. Since this type needs to be resolved + * in runtime, we use match-case idioms for class objects here. However, there are similar + * functions in other files (e.g., HiveInspectors), so these functions need to merged into one. + */ + private[this] def componentTypeToDataType(clz: Class[_]): DataType = clz match { +// primitive types +case c: Class[_] if c == jShort.TYPE => ShortType --- End diff -- the instance matching is not required here, it should be sufficient to use `case jShort.TYPE`. Btw, this is a nitpick but I would recommend that classes should start with a capital letter, i.e. `java.lang.Short` could be imported as `JShort` rather that `jShort`. In this case it really doesn't matter, but lower case classes can bite you when your working with pattern matching --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15078: [SPARK-17523] Support to generate build info file when b...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15078 How wide-spread is powershell on windows computers? I assume that modern versions ship with it, but I also heard that they come with a bash-like shell in the latest version. Does the current build work in the new environment? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15062: SPARK-17424: Fix unsound substitution bug in ScalaReflec...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15062 Oh, I was too quick to comment, I see a complete description is in the JIRA. It would still be good if you could add a test though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15062: SPARK-17424: Fix unsound substitution bug in ScalaReflec...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15062 I don't quite understand what use-case this patch fixes. Can you provide an example (in the form of a test) that reproduces the issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15284: [SPARK-17368] [SQL] Add support for value class s...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15284#discussion_r81025673 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala --- @@ -66,8 +66,6 @@ case class RepeatedData( mapFieldNull: scala.collection.Map[Int, java.lang.Long], structField: PrimitiveData) -case class SpecificCollection(l: List[Int]) --- End diff -- cleanup: found this unused class whilst going over the code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15284: [SPARK-17368] [SQL] Add support for value class s...
GitHub user jodersky opened a pull request: https://github.com/apache/spark/pull/15284 [SPARK-17368] [SQL] Add support for value class serialization and deserialization ## What changes were proposed in this pull request? Value classes were unsupported because catalyst data types were obtained through reflection on erased types, which would resolve to a value class' wrapped type and hence lead to unavailable methods during code generation. This patch simply removes the erasure step when getting data types for catalyst. ## How was this patch tested? Additional tests in `ExpressionEncoderSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jodersky/spark value-classes Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15284.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15284 commit a3dce9f5f9ced492fa6f20b93497856fd51fc757 Author: Jakob Odersky <ja...@odersky.com> Date: 2016-09-28T21:19:26Z Add support for value class serialization and deserialization Value classes were unsupported because catalyst data types were obtained through reflection on erased types, which would resolve to a value class' wrapped type and hence lead to unavailable methods during code generation. This patch simply removes the erasure step when getting data types for catalyst. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14691: [SPARK-16407][STREAMING] Allow users to supply custom st...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/14691 I totally agree! This PR LGTM, but I think a discussion on enabling truly pluggable sources through providers will require more discussion on a more visible channel. @zsxwing, what do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14691: [SPARK-16407][STREAMING] Allow users to supply custom st...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/14691 I like the idea! This is might not be the best place to start a discussion, but I reckon that the sink provider api could also eventually be used to provision builtin sinks. It would make the current, stringly-typed api optional and provide more compile-time safety. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15027: [SPARK-17475] [STREAMING] Delete CRC files if the...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15027#discussion_r78643449 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala --- @@ -146,6 +146,11 @@ class HDFSMetadataLog[T: ClassTag](sparkSession: SparkSession, path: String) // It will fail if there is an existing file (someone has committed the batch) logDebug(s"Attempting to write log #${batchIdToPath(batchId)}") fileManager.rename(tempPath, batchIdToPath(batchId)) + + // SPARK-17475: HDFSMetadataLog should not leak CRC files + // If the underlying filesystem didn't rename the CRC file, delete it. --- End diff -- ok, looks good then --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15027: [SPARK-17475] [STREAMING] Delete CRC files if the...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15027#discussion_r78469460 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala --- @@ -146,6 +146,11 @@ class HDFSMetadataLog[T: ClassTag](sparkSession: SparkSession, path: String) // It will fail if there is an existing file (someone has committed the batch) logDebug(s"Attempting to write log #${batchIdToPath(batchId)}") fileManager.rename(tempPath, batchIdToPath(batchId)) + + // SPARK-17475: HDFSMetadataLog should not leak CRC files + // If the underlying filesystem didn't rename the CRC file, delete it. --- End diff -- Is this specific to streaming, or would other parts of spark benefit if this behavior were changed in the file manager? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/14359 Some comments still refer to the use of queue and should be updated. Other than that, the data structure part now looks good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14359: [SPARK-16719][ML] Random Forests should communica...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/14359#discussion_r75361458 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -161,31 +161,43 @@ private[spark] object RandomForest extends Logging { None } -// FIFO queue of nodes to train: (treeIndex, node) -val nodeQueue = new mutable.Queue[(Int, LearningNode)]() +/* + FILO queue of nodes to train: (treeIndex, node) + We make this FILO by always inserting nodes by appending (+=) and removing with dropRight. --- End diff -- the methods described here still refer to the original queue data structure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/14359 > I switched to Stack and then realized Stack has been deprecated in Scala 2.11... I think you probably read the *immutable* stack docs; the *mutable* stack is not deprecated AFAIK. I can imagine that having a custom stack implementation may allow for additional operations in the future, however we should also consider that using standard collections reduces the load for anyone who will maintain the code then. Btw, I highly recommend to use the [milestone scaladocs](http://www.scala-lang.org/api/2.12.0-M5/scala/collection/mutable/Stack.html) over the current ones. Although 2.12 is not officially out yet, the changes to the library are minimal and the UI is much more pleasant to use. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/14359 Agree, it's not very obvious. In the latter document I think a `push` is akin to `append` and `pop` to `head` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/14359 @jkbradley , you can find the scaladoc on stacks here http://www.scala-lang.org/api/current/index.html#scala.collection.mutable.Stack Also this document http://docs.scala-lang.org/overviews/collections/performance-characteristics gives a nice overview of the different collection types in scala and their performances --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14359: [SPARK-16719][ML] Random Forests should communica...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/14359#discussion_r72322067 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -1110,4 +1124,40 @@ private[spark] object RandomForest extends Logging { } } + /** + * Class for queueing nodes to split on each iteration. + * This is a FILO queue. + */ + private[impl] class NodeQueue { +private var q: mutable.MutableList[(Int, LearningNode)] = + mutable.MutableList.empty[(Int, LearningNode)] + +/** Put (treeIndex, node) into queue. Constant time. */ +def put(treeIndex: Int, node: LearningNode): Unit = { + q += ((treeIndex, node)) +} + +/** Remove and return last inserted element. Linear time (unclear in Scala docs though) */ --- End diff -- How critical is this for performance? I know that `append` is O(1) for scala mutable lists but `dropRight(1)` is actually implemented in a parent class [1] of `MutableList` and therefores does not take advantage of the list's reference to its last element. All in all, from it's implementation it seems that `dropRight` is O(n) [1] https://github.com/scala/scala/blob/v2.11.8/src/library/scala/collection/LinearSeqOptimized.scala#L194-L204 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13729: [SPARK-16008][ML] Remove unnecessary serialization in lo...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/13729 Hi @dbtsai, I assisted @sethah with some serialization issues during this PR. I know we considered using transient but can't recall exactly why we ended up not. My knowledge about the bigger picture of this PR is quite limited, but one explanation that comes to mind is that the `coefficients` and `featuresStd` parameters are only used within the `add` method. So the reasoning was to keep parameters as local as possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13503: [SPARK-15761] [MLlib] [PySpark] Load ipython when defaul...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/13503 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13061: [SPARK-14279] [Build] Pick the spark version from pom
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/13061 jenkins, ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13061: [SPARK-14279] [Build] Pick the spark version from pom
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/13061 Last message was a minute too late, so LGTM then. jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13061: [SPARK-14279] [Build] Pick the spark version from pom
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/13061 In the short-term we should definitely make the REPL welcome message consistent. Could you consolidate it with the conflict resolution? I don't think there are many changes required. However I'm also good with merging this as is now and submitting a separate jira later --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13061: [SPARK-14279] [Build] Pick the spark version from...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/13061#discussion_r65465157 --- Diff: core/src/main/scala/org/apache/spark/package.scala --- @@ -41,7 +41,53 @@ package org.apache * level interfaces. These are subject to changes or removal in minor releases. */ +import java.util.Properties + package object spark { - // For package docs only - val SPARK_VERSION = "2.0.0-SNAPSHOT" + + private object SparkBuildInfo { +val resourceStream = Thread.currentThread().getContextClassLoader. + getResourceAsStream("spark-version-info.properties") + +val ( + spark_version: String, --- End diff -- oh, this is actually local, so no big deal. If you get a chance to fix it during conflict resolution it would be great though --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13061: [SPARK-14279] [Build] Pick the spark version from pom
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/13061 Sorry to bother you again and after a so long delay, however I just found another minor style bug --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13061: [SPARK-14279] [Build] Pick the spark version from...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/13061#discussion_r65464868 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -103,6 +104,9 @@ object SparkSubmit { /___/ .__/\_,_/_/ /_/\_\ version %s /_/ """.format(SPARK_VERSION)) +printStream.println("Branch %s".format(SPARK_BRANCH)) +printStream.println("Compiled by user %s on %s".format(SPARK_BUILD_USER, SPARK_BUILD_DATE)) +printStream.println("Url %s".format(SPARK_REPO_URL)) --- End diff -- It would be great to also add the build info to org.apache.spark.repl.SparkILoop --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13061: [SPARK-14279] [Build] Pick the spark version from...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/13061#discussion_r65464743 --- Diff: core/src/main/scala/org/apache/spark/package.scala --- @@ -41,7 +41,53 @@ package org.apache * level interfaces. These are subject to changes or removal in minor releases. */ +import java.util.Properties + package object spark { - // For package docs only - val SPARK_VERSION = "2.0.0-SNAPSHOT" + + private object SparkBuildInfo { +val resourceStream = Thread.currentThread().getContextClassLoader. + getResourceAsStream("spark-version-info.properties") + +val ( + spark_version: String, --- End diff -- is this purposefully using underscores? `sparkVersion` is consistent with the general style --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14279] [Build] Pick the spark version f...
Github user jodersky commented on the pull request: https://github.com/apache/spark/pull/13061#issuecomment-222380888 What I understand from the conversation is that maven assumes the target/extra-resources directory: > Unfortunately I'm not an maven expert but I wonder if we can put it in like target/extra-resources and have it automatically picked up still. http://maven.apache.org/plugins/maven-resources-plugin/examples/copy-resources.html I'm suggesting to use `resourceManaged` in sbt, since I have had past problems with resources not getting picked up automatically when they were not in that directory. It's a minor detail and if everything works for you, than this LGTM too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14279] [Build] Pick the spark version f...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/13061#discussion_r64942108 --- Diff: project/SparkBuild.scala --- @@ -436,7 +439,19 @@ object SparkBuild extends PomBuild { else x.settings(Seq[Setting[_]](): _*) } ++ Seq[Project](OldDeps.project) } +} +object Core { + lazy val settings = Seq( +resourceGenerators in Compile += Def.task { + val buildScript = baseDirectory.value + "/../build/spark-build-info" + val targetDir = baseDirectory.value + "/target/extra-resources/" + val command = buildScript + " " + targetDir + " " + version.value + Process(command).!! + val propsFile = baseDirectory.value / "target" / "extra-resources" / "spark-version-info.properties" --- End diff -- Maybe a nit: shouldn't all resources be generated into `resourceManaged`? I'm not sure how sbt will handle packaging of non-resourcemanaged artifacts; it could be that spark works locally, however errors may occur in a packaged environment. I would suggest renaming `"target"/"extra-resources"` to `(resourceManaged in Compile).value`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14279] Spark Version will be picked fro...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/12152#discussion_r63970293 --- Diff: core/src/main/java/org/apache/spark/VersionInfo.java --- @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.io.InputStream; +import java.util.Properties; + +public class VersionInfo { + + private static final Logger LOG = LoggerFactory.getLogger(VersionInfo.class); + private Properties info; + + protected VersionInfo(String component) { +info = new Properties(); +String versionInfoFile = component + "-version-info.properties"; +InputStream is = null; +try { + is = Thread.currentThread().getContextClassLoader() + .getResourceAsStream(versionInfoFile); + if (is == null) { +throw new IOException("Resource not found"); + } + info.load(is); +} catch (IOException ex) { +} finally { + if (is != null) { +try { + is.close(); +} catch (IOException ioex) { +} + } +} + } + + protected String _getVersion() { +return info.getProperty("version", "Unknown"); + } + + protected String _getRevision() { +return info.getProperty("revision", "Unknown"); + } + + protected String _getBranch() { +return info.getProperty("branch", "Unknown"); + } + + protected String _getDate() { +return info.getProperty("date", "Unknown"); + } + + protected String _getUser() { +return info.getProperty("user", "Unknown"); + } + + protected String _getUrl() { +return info.getProperty("url", "Unknown"); + } + + protected String _getSrcChecksum() { +return info.getProperty("srcChecksum", "Unknown"); + } + + protected String _getBuildVersion(){ +return getVersion() + +" from " + _getRevision() + +" by " + _getUser() + +" source checksum " + _getSrcChecksum(); + } + + private static VersionInfo COMMON_VERSION_INFO = new VersionInfo("spark"); --- End diff -- This whole class seems pretty complex. Why are the accessors protected but exposed through static methods? Would it not be simpler to just have a static getter to a single instance and make all member accessors public? In that case I would also recommend looking into implementing this as a class/singleton object. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14279] Spark Version will be picked fro...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/12152#discussion_r63969585 --- Diff: core/src/main/java/org/apache/spark/VersionInfo.java --- @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.io.InputStream; +import java.util.Properties; + +public class VersionInfo { + + private static final Logger LOG = LoggerFactory.getLogger(VersionInfo.class); + private Properties info; + + protected VersionInfo(String component) { +info = new Properties(); +String versionInfoFile = component + "-version-info.properties"; +InputStream is = null; +try { + is = Thread.currentThread().getContextClassLoader() + .getResourceAsStream(versionInfoFile); + if (is == null) { +throw new IOException("Resource not found"); + } + info.load(is); +} catch (IOException ex) { +} finally { + if (is != null) { +try { + is.close(); +} catch (IOException ioex) { +} + } +} + } + + protected String _getVersion() { +return info.getProperty("version", "Unknown"); + } + + protected String _getRevision() { +return info.getProperty("revision", "Unknown"); + } + + protected String _getBranch() { +return info.getProperty("branch", "Unknown"); + } + + protected String _getDate() { +return info.getProperty("date", "Unknown"); + } + + protected String _getUser() { +return info.getProperty("user", "Unknown"); + } + + protected String _getUrl() { +return info.getProperty("url", "Unknown"); + } + + protected String _getSrcChecksum() { +return info.getProperty("srcChecksum", "Unknown"); + } + + protected String _getBuildVersion(){ +return getVersion() + +" from " + _getRevision() + +" by " + _getUser() + +" source checksum " + _getSrcChecksum(); + } + + private static VersionInfo COMMON_VERSION_INFO = new VersionInfo("spark"); + + public static String getVersion() { +return COMMON_VERSION_INFO._getVersion(); + } + + public static String getRevision() { +return COMMON_VERSION_INFO._getRevision(); + } + + public static String getBranch() { +return COMMON_VERSION_INFO._getBranch(); + } + + public static String getDate() { +return COMMON_VERSION_INFO._getDate(); + } + + public static String getUser() { +return COMMON_VERSION_INFO._getUser(); + } + + public static String getUrl() { +return COMMON_VERSION_INFO._getUrl(); + } + + public static String getSrcChecksum() { +return COMMON_VERSION_INFO._getSrcChecksum(); + } + + public static String getBuildVersion(){ +return COMMON_VERSION_INFO._getBuildVersion(); + } + + public static void main(String[] args) { --- End diff -- Is the main method in this utility class ever used? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14279] Spark Version will be picked fro...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/12152#discussion_r63969372 --- Diff: core/src/main/java/org/apache/spark/VersionInfo.java --- @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.io.InputStream; +import java.util.Properties; + +public class VersionInfo { + + private static final Logger LOG = LoggerFactory.getLogger(VersionInfo.class); --- End diff -- this field is never used --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14279] [Build] Pick the spark version f...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/13061#discussion_r6396 --- Diff: project/SparkBuild.scala --- @@ -435,7 +438,19 @@ object SparkBuild extends PomBuild { else x.settings(Seq[Setting[_]](): _*) } ++ Seq[Project](OldDeps.project) } +} +object Core { + lazy val settings = Seq( +unmanagedResourceDirectories in Compile += baseDirectory.value / "target" / "extra-resources", +resourceGenerators in Compile += Def.task { + val buildScript = baseDirectory.value + "/../build/spark-build-info" + val targetDir = baseDirectory.value + "/target/extra-resources/" + val command = buildScript + " " + targetDir + " " + version.value + Process(Seq("/bin/bash", "-c", command)).!! --- End diff -- Is the extra unmanagedResourceDirectory required? Alternatively, the resource generator could output the extra properties file into the `resourceManaged` directory (defaults to "resource_managed"), that way it would also automatically be picked up in the classpath. Btw, I think that generating the file into `resourceManaged` would solve the problem with clean you reported on the mailing list. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14511][Build] Upgrade genjavadoc to lat...
Github user jodersky commented on the pull request: https://github.com/apache/spark/pull/12707#issuecomment-215530295 @srowen, I did quite some digging through the genjavadoc codebase when re-implementing @mengxr's initial fix. I can try to fix the group warnings and object privacy issues, however I can't give you a fixed deadline since I won't be available until the 10th. Since the Spark 2.0 freeze is coming up, do you still want to merge this (since it supports Scala 2.12)? I'll fix it regardless for a future version. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10001][Core] Don't short-circuit action...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/12745#discussion_r61370492 --- Diff: core/src/main/scala/org/apache/spark/util/SignalUtils.scala --- @@ -94,7 +94,7 @@ private[spark] object SignalUtils extends Logging { // run all actions, escalate to parent handler if no action catches the signal // (i.e. all actions return false) - val escalate = actions.asScala.forall { action => !action() } + val escalate = actions.asScala.map(action => action()).forall(_ == false) --- End diff -- good idea --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10001][Core] Don't short-circuit action...
Github user jodersky commented on the pull request: https://github.com/apache/spark/pull/12745#issuecomment-215302197 removed the label, sorry about that --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10001][Core][Hotfix] Don't short-circui...
Github user jodersky commented on the pull request: https://github.com/apache/spark/pull/12745#issuecomment-215272244 looks like a flaky test jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14511][Build] Upgrade genjavadoc to lat...
Github user jodersky commented on the pull request: https://github.com/apache/spark/pull/12707#issuecomment-215258121 Hmm, the current genjavadoc also produces errors so I didn't really look into it. How does the final output compare? Could this be due to java 8's stricter behaviour as @srowen mentioned? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10001][Core][Hotfix] Don't short-circui...
GitHub user jodersky opened a pull request: https://github.com/apache/spark/pull/12745 [SPARK-10001][Core][Hotfix] Don't short-circuit actions in signal handlers ## What changes were proposed in this pull request? The current signal handlers have a subtle bug that stops evaluating registered actions as soon as one of them returns true, this is because `forall` is short-circuited. This PR adds a strict mapping stage before evaluating returned result. There are no known occurrences of the bug and this is a preemptive fix. ## How was this patch tested? As with the original introduction of signal handlers, this was tested manually (unit testing with signals is not straightforward). You can merge this pull request into a Git repository by running: $ git pull https://github.com/jodersky/spark SPARK-10001-hotfix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12745.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12745 commit 7fe0e54c5e39ff942fd58a16d5e5e309887ee883 Author: Jakob Odersky <ja...@odersky.com> Date: 2016-04-27T22:26:41Z Evaluate all actions in signal handlers (don't short-circuit) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14661] [MLlib] trim PCAModel by require...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/12419#discussion_r61335359 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -379,15 +379,21 @@ class RowMatrix @Since("1.0.0") ( * * Note that this cannot be computed on matrices with more than 65535 columns. * - * @param k number of top principal components. + * @param filter either the number of top principal components or variance + * retained by the minimal set of principal components. * @return a matrix of size n-by-k, whose columns are principal components, and * a vector of values which indicate how much variance each principal component * explains */ @Since("1.6.0") - def computePrincipalComponentsAndExplainedVariance(k: Int): (Matrix, Vector) = { + def computePrincipalComponentsAndExplainedVariance(filter: Either[Int, Double]) --- End diff -- Not sure about the breakage, nevertheless I would recommend implementing a new method regardless. I find the method's parameter type `Either[Int, Double]` to be quite confusing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user jodersky commented on the pull request: https://github.com/apache/spark/pull/12734#issuecomment-215189992 Could you change the title to `[SPARK-14954] (current title)`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add PARTITION BY and BUCKET BY clause for data...
Github user jodersky commented on the pull request: https://github.com/apache/spark/pull/12734#issuecomment-215186982 Does this pr fix a ticket? In that case it would be useful to change the title to include the [SPARK-] prefix so that the JIRA status gets updated --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14661] [MLlib] trim PCAModel by require...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/12419#discussion_r61313031 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -379,15 +379,21 @@ class RowMatrix @Since("1.0.0") ( * * Note that this cannot be computed on matrices with more than 65535 columns. * - * @param k number of top principal components. + * @param filter either the number of top principal components or variance + * retained by the minimal set of principal components. * @return a matrix of size n-by-k, whose columns are principal components, and * a vector of values which indicate how much variance each principal component * explains */ @Since("1.6.0") - def computePrincipalComponentsAndExplainedVariance(k: Int): (Matrix, Vector) = { + def computePrincipalComponentsAndExplainedVariance(filter: Either[Int, Double]) --- End diff -- I'm no expert in the ML domain, but from a user perspective, this breaks API backwards compatibility. An alternative could be to create a new method and factor out common behaviour shared with the current `computePrincipalComponentsAndExplainedVariance` into a private utility method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14511][Build] Upgrade genjavadoc to lat...
Github user jodersky commented on the pull request: https://github.com/apache/spark/pull/12707#issuecomment-215010184 The work I submitted upstream doesn't specifically target that issue, however it is actually plausible that it fixes the private interface issue as a side effect. I haven't checked it though. On Wed, Apr 27, 2016 at 1:14 AM, Sean Owen <notificati...@github.com> wrote: > Seems reasonable to me if the upstream version does all that's needed. > I wonder if this new version happens to fix the problem in ... > https://issues.apache.org/jira/browse/SPARK-3359 > > â > You are receiving this because you authored the thread. > Reply to this email directly or view it on GitHub > <https://github.com/apache/spark/pull/12707#issuecomment-215006856> > --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][MAINTENANCE] Deprecation fixes.
Github user jodersky commented on the pull request: https://github.com/apache/spark/pull/12653#issuecomment-214857003 If I recall correctly, Scala 2.10 will stay for 2.0.0 (probably not for all future 2.x releases though, it could be removed as soon as 2.1.0). However, I see no problem in reopening the issue when the time comes, you or any committer can do that. I would suggest you rename the pr to include the jira number, something like `[SPARK-14417] Deprecation fixes from Scala 2.10`, before you close it. That way it will be linked to from the issue tracker for the future. thanks for working on this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Upgrade genjavadoc and use upstream version
Github user jodersky commented on the pull request: https://github.com/apache/spark/pull/12707#issuecomment-214849927 cc @mengxr @JoshRosen, is the maven build ok? I could only find references to genjavadoc in sbt. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Upgrade genjavadoc and use upstream version
GitHub user jodersky opened a pull request: https://github.com/apache/spark/pull/12707 Upgrade genjavadoc and use upstream version ## What changes were proposed in this pull request? In the past, genjavadoc had issues with package private members which led the spark project to use a forked version. This issue has been fixed upstream (typesafehub/genjavadoc#70) and a release is available for scala versions 2.10, 2.11 **and 2.12**, hence a forked version for spark is no longer necessary. This pull request updates the build configuration to use the newest upstream genjavadoc. ## How was this patch tested? The build was run `sbt unidoc`. During the process javadoc emits some errors on the generates java stubs, however these errors were also present before the upgrade. Furthermore, the produces html is fine. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jodersky/spark SPARK-14511-genjavadoc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12707.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12707 commit f7d649c47d0d6f082b5a6237e4b60e8bd8cf3716 Author: Jakob Odersky <ja...@odersky.com> Date: 2016-04-25T22:58:40Z Upgrade genjavadoc and use upstream version --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14347] [WIP] Require Java 8 for Spark 2...
Github user jodersky commented on the pull request: https://github.com/apache/spark/pull/12165#issuecomment-213328740 @srowen, @rxin considering the discussions around the matter, should this pr be closed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Do not allow alter column commands.
Github user jodersky commented on the pull request: https://github.com/apache/spark/pull/12332#issuecomment-213325682 @yhuai, is this pr still actively used for testing purposes? would you mind closing it if not, thx --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: merge
Github user jodersky commented on the pull request: https://github.com/apache/spark/pull/12442#issuecomment-213320568 it looks like this was made by mistake. @Sunshineit, @srowen (or anyone with the commit powers), could you please close this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org