[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21482 How is this done in other databases? I don't think we want to invent new ways on these basic primitives. --- - To unsubscribe, e

[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-05 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r193230476 --- Diff: R/pkg/NAMESPACE --- @@ -281,6 +281,8 @@ exportMethods("%<=>%", "initcap",

[GitHub] spark issue #21448: [SPARK-24408][SQL][DOC] Move abs, bitwiseNOT, isnan, nan...

2018-05-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21448 I'd only move abs and nothing else. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #21459: [SPARK-24420][Build] Upgrade ASM to 6.1 to support JDK9+

2018-05-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21459 What's driving this (is it java 9)? I'm in general scared by core library updates like this. Maybe Spark 3.0 is a good time (and we should just do it this year

[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs

2018-05-29 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21453 Jenkins, add to whitelist. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs

2018-05-29 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21453 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21416: [SPARK-24371] [SQL] Added isInCollection in DataFrame AP...

2018-05-29 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21416 LGTM (I didn't look that carefully though) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21416: [SPARK-24371] [SQL] Added isInCollection in DataF...

2018-05-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21416#discussion_r191306678 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala --- @@ -392,9 +396,97 @@ class ColumnExpressionSuite extends QueryTest

[GitHub] spark pull request #21416: [SPARK-24371] [SQL] Added isInCollection in DataF...

2018-05-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21416#discussion_r191306654 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala --- @@ -392,9 +396,97 @@ class ColumnExpressionSuite extends QueryTest

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21427 If we can fix it without breaking existing behavior that would be awesome. On Fri, May 25, 2018 at 9:59 AM Bryan Cutler <notificati...@github.com> wrote: > I've been think

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21427 On the config part, I haven’t looked at the code but can’t we just reorder the columns on the JVM side? Why do we need to reorder them on the Python side? On Fri, May 25, 2018

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21427 I agree it should have started experimental. It is pretty weird to after the fact mark something experimental though. On Fri, May 25, 2018 at 12:23 AM Hyukjin Kwon <notificati...@github.

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21427 Why is it difficult? On Fri, May 25, 2018 at 12:03 AM Hyukjin Kwon <notificati...@github.com> wrote: > but as I said it's difficult to have a configuration there. Shal

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r190803873 --- Diff: python/pyspark/sql/dataframe.py --- @@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False): name | Bob

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r190803855 --- Diff: python/pyspark/sql/dataframe.py --- @@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False): name | Bob

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r190803772 --- Diff: docs/configuration.md --- @@ -456,6 +456,29 @@ Apart from these, the following properties are also available, and may be useful from JVM

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r190803641 --- Diff: docs/configuration.md --- @@ -456,6 +456,29 @@ Apart from these, the following properties are also available, and may be useful from JVM

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21427 If this has been released you can't just change it like this; it will break users' programs immediately. At the very least introduce a flag so it can be set by the user to avoid breaking their code

[GitHub] spark issue #21242: [SPARK-23657][SQL] Document and expose the internal data...

2018-05-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21242 Thanks Ryan. I'm not a fan of just exposing internal classes like this. The APIs haven't really been designed or audited for the purpose of external consumption. If we want to expose the internal APIs

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-21 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r189669772 --- Diff: docs/configuration.md --- @@ -456,6 +456,29 @@ Apart from these, the following properties are also available, and may be useful from JVM

[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-05-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21370 Can we also do something a bit more generic that works for non-Jupyter notebooks as well? For example, in IPython or just plain Python REPL

[GitHub] spark issue #21329: [SPARK-24277][SQL] Code clean up in SQL module: HadoopMa...

2018-05-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21329 Why are we cleaning up stuff like this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #21192: [SPARK-24118][SQL] Flexible format for the lineSep optio...

2018-05-17 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21192 my point is that i don't consider a sequence of chars an array to begin with. it is not natural to me. I'd want an array if it is a different set of separators

[GitHub] spark issue #21192: [SPARK-24118][SQL] Flexible format for the lineSep optio...

2018-05-16 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21192 eh I actually think separated makes it much simpler to look at, compared with an array. Why complicate the API and require users to understand how to specify an array (in all languages)? One

[GitHub] spark issue #21318: [minor] Update docs for functions.scala to make it clear...

2018-05-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21318 It's still going to fail because I haven't updated it yet. Will do tomorrow. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21316: [SPARK-20538][SQL] Wrap Dataset.reduce with withN...

2018-05-14 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21316#discussion_r188104204 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1607,7 +1607,9 @@ class Dataset[T] private[sql]( */ @Experimental

[GitHub] spark issue #21318: [minor] Update docs for functions.scala to make it clear...

2018-05-13 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21318 Hm the failure doesn't look like it's caused by this PR. Do you guys know what's going on? --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #21318: [minor] Update docs for functions.scala to make it clear...

2018-05-13 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21318 cc @gatorsmile @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21318: [minor] Update docs for functions.scala to make i...

2018-05-13 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/21318 [minor] Update docs for functions.scala to make it clear not all the built-in functions are defined there The title summarizes the change. You can merge this pull request into a Git repository

[GitHub] spark pull request #21316: [SPARK-20538][SQL] Wrap Dataset.reduce with withN...

2018-05-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21316#discussion_r187838099 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1607,7 +1607,9 @@ class Dataset[T] private[sql]( */ @Experimental

[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...

2018-05-11 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21309 Better compile time error. Plus a lot of people are already using these. On Fri, May 11, 2018 at 7:35 PM Hyukjin Kwon <notificati...@github.com> wrote: > Yup, then why

[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...

2018-05-11 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21309 Adding it to sql would allow it to be available everywhere (through expr) right? On Fri, May 11, 2018 at 7:30 PM Hyukjin Kwon <notificati...@github.com> wrote: > Thing

[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...

2018-05-11 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21309 Btw it’s been always the case that the less commonly used functions are not part of this file. There is just a lot of overhead to maintaining all of them. I’m not even sure

[GitHub] spark issue #21054: [SPARK-23907][SQL] Add regr_* functions

2018-05-11 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21054 There is not a single function that can’t be called by expr. It mainly adds some type safety. On Fri, May 11, 2018 at 7:18 PM Hyukjin Kwon <notificati...@github.com>

[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...

2018-05-11 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21309 cc @gatorsmile @mgaido91 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21309: [SPARK-23907] Removes regr_* functions in functio...

2018-05-11 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/21309 [SPARK-23907] Removes regr_* functions in functions.scala ## What changes were proposed in this pull request? This patch removes the various regr_* functions in functions.scala. They are so

[GitHub] spark pull request #21054: [SPARK-23907][SQL] Add regr_* functions

2018-05-11 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21054#discussion_r187751801 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -775,6 +775,178 @@ object functions { */ def var_pop(columnName

[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index

2018-05-01 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21121 @lokm01 wouldn't @ueshin's suggestion on adding a second parameter to transform work for you? You can just do something similar to `transform(x, (entry, index) -> struct(entry, index))`. Perh

[GitHub] spark pull request #21187: [SPARK-24035][SQL] SQL syntax for Pivot

2018-04-30 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21187#discussion_r185084802 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/PivotSuite.scala --- @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #21169: [SPARK-23715][SQL] the input of to/from_utc_times...

2018-04-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21169#discussion_r184596334 --- Diff: docs/sql-programming-guide.md --- @@ -1805,12 +1805,13 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark issue #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Optimizer

2018-04-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/20560 Just saw this - this seems like a somewhat awkward way to do it by just matching on filter / project. Is the main thing lacking a way to do back propagation for properties? (We can only do forward

[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

2018-04-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21071 @devaraj-kavali can you close this PR first? Looks like there isn't any reason to really use htrace anymore

[GitHub] spark issue #19222: [SPARK-10399][SPARK-23879][CORE][SQL] Introduce multiple...

2018-04-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19222 @kiszk do you have more data now? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #19222: [SPARK-10399][SPARK-23879][CORE][SQL] Introduce multiple...

2018-04-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19222 OK thanks please do that. Does TPC-DS even trigger 2 call sites? E.g. ByteArrayMemoryBlock and OnHeapMemoryBlock. Even there it might introduce a conditional branch after JIT that could lead to perf

[GitHub] spark issue #19222: [SPARK-10399][SPARK-23879][CORE][SQL] Introduce multiple...

2018-04-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19222 Sorry this thread is too long for me to follow. I might be bringing up a point that has been brought up before. @kiszk did your perf tests take into account megamorphic callsites? It seems

[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

2018-04-17 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19881 Thanks @jcuquemelle --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

2018-04-16 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21071 This probably deserves its own SPIP. Also unclear whether we should just support htrace, or have an extension api so users can plug in whatever they want

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-16 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21060 It looks to me this is a bug fix that can merit backporting, as QueryExecutionListener is also marked as experimental, In this case, I think @gatorsmile is worried one might have written

[GitHub] spark issue #20992: [SPARK-23779][SQL] TaskMemoryManager and UnsafeSorter re...

2018-04-13 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/20992 What are the performance improvements? Without additional data this seems like just an invasive change without any real benefits

[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

2018-04-13 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21031 If there is already size, why do we need to create a new implementation? Why can't we just rewrite cardinality to size? Also I wouldn't add any programming API for this, since

[GitHub] spark pull request #21056: [SPARK-23849][SQL] Tests for samplingRatio of jso...

2018-04-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21056#discussion_r181530121 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -2128,38 +2128,60 @@ class JsonSuite extends

[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r181529978 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -413,6 +413,78 @@ class DataFrameFunctionsSuite extends QueryTest

[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r181529901 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -413,6 +413,78 @@ class DataFrameFunctionsSuite extends QueryTest

[GitHub] spark pull request #20933: [SPARK-23817][SQL]Migrate ORC file format read pa...

2018-04-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/20933#discussion_r181529318 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcDataSourceV2.scala --- @@ -0,0 +1,194 @@ +/* + * Licensed

[1/2] spark-website git commit: Update text/wording to more "modern" Spark and more consistent.

2018-04-12 Thread rxin
Repository: spark-website Updated Branches: refs/heads/asf-site 91b561749 -> 658467248 http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/news/strata-exercises-now-available-online.html -- diff --git

[2/2] spark-website git commit: Update text/wording to more "modern" Spark and more consistent.

2018-04-12 Thread rxin
Update text/wording to more "modern" Spark and more consistent. 1. Use DataFrame examples. 2. Reduce explicit comparison with MapReduce, since the topic does not really come up. 3. More focus on analytics rather than "cluster compute". 4. Update committer affiliation. 5. Make it more clear

[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

2018-04-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19881 I thought about this more, and I actually think something like this makes more sense: `executorAllocationRatio`. Basically it is just a ratio that determines how aggressive we want Spark to request

[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

2018-04-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19881 SGTM on divisor. Do we need "full" there in the config? --- - To unsubscribe, e-mail: review

[GitHub] spark issue #20045: [Spark-22360][SQL][TEST] Add unit tests for Window Speci...

2018-04-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/20045 Can we add them to the file based test suites instead? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

2018-04-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19881 Maybe instead of "divisor", we just have a "rate" or "factor" that can be floating point value, and use multiplication rather than division? This way people can als

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/20937 Seems fine to me ... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/20959 I'm good with having this option given the data @MaxGekk posted. (I haven't reviewed the code - somebody else should do that before merging). `val sampledSchema = spark.read.option

[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

2018-03-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19881 Can you wait another day? I just find the name pretty weird. Do we have other configs that use the “divisor” suffix? On Wed, Mar 28, 2018 at 7:23 AM Tom Graves <notificati...@github.

[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

2018-03-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/20877 We can also change both if they haven’t been released yet. On Sun, Mar 25, 2018 at 10:37 AM Maxim Gekk <notificati...@github.com> wrote: > @gatorsmile <https:

[GitHub] spark issue #20731: [SPARK-23579][Documentation] Added context model image a...

2018-03-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/20731 Yea we gotta be careful with adding commercial vendor logos here. It's part of the complexity we need to navigate being hosted at the Apache Software Foundation. The project needs to be very vendor

[GitHub] spark pull request #20774: [SPARK-23549][SQL] Cast to timestamp when compari...

2018-03-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/20774#discussion_r175335072 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -479,6 +479,15 @@ object SQLConf { .checkValues

[GitHub] spark pull request #20774: [SPARK-23549][SQL] Cast to timestamp when compari...

2018-03-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/20774#discussion_r175334948 --- Diff: sql/core/src/test/resources/sql-tests/inputs/predicate-functions.sql --- @@ -39,3 +43,4 @@ select 2.0 <= '2.2'; select 0.5 <

[2/2] spark-website git commit: Squashed commit of the following:

2018-03-16 Thread rxin
Squashed commit of the following: commit 8e2dd71cf5613be6f019bb76b46226771422a40e Merge: 8bd24fb6d 01f0b4e0c Author: Reynold Xin Date: Fri Mar 16 10:24:54 2018 -0700 Merge pull request #104 from mateiz/history Add a project history page commit

[1/2] spark-website git commit: Squashed commit of the following:

2018-03-16 Thread rxin
Repository: spark-website Updated Branches: refs/heads/asf-site 8bd24fb6d -> a1d84bcbf http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-summit-june-2016-agenda-posted.html -- diff --git

[GitHub] spark issue #20800: [SPARK-23627][SQL] Provide isEmpty in Dataset

2018-03-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/20800 So the API looks useful, but I don't know if this is the right implementation. How important is it to add this? It seems like the value is not super high either

[GitHub] spark pull request #20800: [SPARK-23627][SQL] Provide isEmpty in DataSet

2018-03-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/20800#discussion_r174016939 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -511,6 +511,14 @@ class Dataset[T] private[sql]( */ def isLocal

[GitHub] spark issue #20674: [SPARK-23465][SQL] Introduce new function to rename colu...

2018-03-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/20674 I personally wouldn't include this since it's a simple function users can write ... --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #20706: [SPARK-23550][core] Cleanup `Utils`.

2018-03-01 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/20706#discussion_r171666996 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -267,44 +264,20 @@ private[spark] object Utils extends Logging

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fall back to non-Arr...

2018-02-12 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/20567 A quick bit: fallback is a single word. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #20490: [SPARK-23323][SQL]: Support commit coordinator fo...

2018-02-08 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/20490#discussion_r167137165 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceWriter.java --- @@ -62,6 +62,16 @@ */ DataWriterFactory

[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

2018-02-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/20499 I'd fix this in 2.3, and 2.2.1 as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #20535: [SPARK-23341][SQL] define some standard options f...

2018-02-07 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/20535#discussion_r166701501 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceOptions.java --- @@ -27,6 +27,39 @@ /** * An immutable string-to-string

[GitHub] spark issue #20491: [SQL] Minor doc update: Add an example in DataFrameReade...

2018-02-02 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/20491 This should also go into branch-2.3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #20491: [SQL] Minor doc update: Add an example in DataFra...

2018-02-02 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/20491 [SQL] Minor doc update: Add an example in DataFrameReader.schema ## What changes were proposed in this pull request? This patch adds a small example to the schema string definition of schema

[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

2018-02-02 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16793 Also the implementation doesn't match what was proposed in https://issues.apache.org/jira/browse/SPARK-19454 Having null value as the default in a function called replace is too risky

[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

2018-02-02 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16793 Sorry I object this change. Why would we put null as the default replace value, in a function called replace? That seems very counterintuitive and error prone

[GitHub] spark issue #20219: [SPARK-23025][SQL] Support Null type in scala reflection

2018-01-11 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/20219 But it is possible to generate NullType data right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #20152: [SPARK-22957] ApproxQuantile breaks if the number of row...

2018-01-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/20152 cc @gatorsmile @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20072: [SPARK-22790][SQL] add a configurable factor to d...

2018-01-03 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/20072#discussion_r159573530 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -261,6 +261,17 @@ object SQLConf { .booleanConf

[GitHub] spark issue #20076: [SPARK-21786][SQL] When acquiring 'compressionCodecClass...

2017-12-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/20076 Thanks for the PR. Why are we complicating the PR by doing the rename? Does this actually gain anything other than minor cosmetic changes? It makes the simple PR pretty long

spark git commit: [SPARK-22648][K8S] Spark on Kubernetes - Documentation

2017-12-21 Thread rxin
our fork. Rest is documentation. cc rxin mateiz (shepherd) k8s-big-data SIG members & contributors: foxish ash211 mccheah liyinan926 erikerlandson ssuchter varunkatta kimoonkim tnachen ifilonenko reviewers: vanzin felixcheung jiangxb1987 mridulm TODO: - [x] Add dockerfiles directory t

[GitHub] spark issue #19946: [SPARK-22648] [K8S] Spark on Kubernetes - Documentation

2017-12-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19946 Merging in master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19973: [SPARK-22779] FallbackConfigEntry's default value...

2017-12-21 Thread rxin
Github user rxin closed the pull request at: https://github.com/apache/spark/pull/19973 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19946: [SPARK-22648] [K8S] Spark on Kubernetes - Documen...

2017-12-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19946#discussion_r158205893 --- Diff: docs/building-spark.md --- @@ -49,7 +49,7 @@ To create a Spark distribution like those distributed by the to be runnable, use `./dev/make

[GitHub] spark issue #20014: [SPARK-22827][CORE] Avoid throwing OutOfMemoryError in c...

2017-12-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/20014 Overall change lgtm. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20014: [SPARK-22827][CORE] Avoid throwing OutOfMemoryErr...

2017-12-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/20014#discussion_r157673852 --- Diff: core/src/main/java/org/apache/spark/memory/SparkOutOfMemoryError.java --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #19973: [SPARK-22779] FallbackConfigEntry's default value should...

2017-12-13 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19973 @vanzin you got a min to submit a patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19946#discussion_r156821519 --- Diff: docs/building-spark.md --- @@ -49,7 +49,7 @@ To create a Spark distribution like those distributed by the to be runnable, use `./dev/make

[GitHub] spark issue #19973: [SPARK-22779] FallbackConfigEntry's default value should...

2017-12-13 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19973 That's what the "default" is, isn't it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additiona

[GitHub] spark issue #19973: [SPARK-22779] ConfigEntry's default value should actuall...

2017-12-13 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19973 The issue is in ``` /** * Return the `string` value of Spark SQL configuration property for the given key. If the key is * not set yet, return `defaultValue

[GitHub] spark pull request #19973: [SPARK-22779] ConfigEntry's default value should ...

2017-12-13 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/19973 [SPARK-22779] ConfigEntry's default value should actually be a value ## What changes were proposed in this pull request? ConfigEntry's config value right now shows a human readable message

[GitHub] spark issue #19973: [SPARK-22779] ConfigEntry's default value should actuall...

2017-12-13 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19973 cc @vanzin @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #19861: [SPARK-22387][SQL] Propagate session configs to d...

2017-12-07 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19861#discussion_r155693977 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ConfigSupport.scala --- @@ -0,0 +1,85 @@ +/* + * Licensed

[GitHub] spark pull request #19861: [SPARK-22387][SQL] Propagate session configs to d...

2017-12-07 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19861#discussion_r155693966 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ConfigSupport.scala --- @@ -0,0 +1,85 @@ +/* + * Licensed

[GitHub] spark issue #19905: [SPARK-22710] ConfigBuilder.fallbackConf should trigger ...

2017-12-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19905 cc @vanzin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

<    1   2   3   4   5   6   7   8   9   10   >