[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user jomach commented on the issue: https://github.com/apache/spark/pull/19429 @gatorsmile I dressed your comments. Still I cannot use the jekyll build... `SKIP_API=1 jekyll build --incremental Configuration file: /Users/jorge/Downloads/spark/docs/_config.yml Deprecation: The 'gems' configuration option has been renamed to 'plugins'. Please update your config file accordingly. Source: /Users/jorge/Downloads/spark/docs Destination: /Users/jorge/Downloads/spark/docs/_site Incremental build: enabled Generating... Liquid Exception: invalid byte sequence in US-ASCII in _layouts/redirect.html ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r143633890 --- Diff: python/pyspark/sql/session.py --- @@ -510,9 +511,43 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr except Exception: has_pandas = False if has_pandas and isinstance(data, pandas.DataFrame): -if schema is None: -schema = [str(x) for x in data.columns] -data = [r.tolist() for r in data.to_records(index=False)] +if self.conf.get("spark.sql.execution.arrow.enable", "false").lower() == "true" \ --- End diff -- The config name was modified to `spark.sql.execution.arrow.enabled` at d29d1e87995e02cb57ba3026c945c3cd66bb06e2 and af8a34c787dc3d68f5148a7d9975b52650bb7729. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19462: [SPARK-22159][SQL][FOLLOW-UP] Make config names c...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19462 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19462: [SPARK-22159][SQL][FOLLOW-UP] Make config names consiste...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19462 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19462: [SPARK-22159][SQL][FOLLOW-UP] Make config names consiste...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19462 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19462: [SPARK-22159][SQL][FOLLOW-UP] Make config names consiste...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19462 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82575/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19462: [SPARK-22159][SQL][FOLLOW-UP] Make config names consiste...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19462 **[Test build #82575 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82575/testReport)** for PR 19462 at commit [`5bef05e`](https://github.com/apache/spark/commit/5bef05e3d84805866103766f6287ecb054dcad68). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143630635 --- Diff: python/pyspark/sql/tests.py --- @@ -3376,6 +3376,151 @@ def test_vectorized_udf_empty_partition(self): res = df.select(f(col('id'))) self.assertEquals(df.collect(), res.collect()) +def test_vectorized_udf_varargs(self): +from pyspark.sql.functions import pandas_udf, col +df = self.spark.createDataFrame(self.sc.parallelize([Row(id=1)], 2)) +f = pandas_udf(lambda *v: v[0], LongType()) +res = df.select(f(col('id'))) +self.assertEquals(df.collect(), res.collect()) + + +@unittest.skipIf(not _have_pandas or not _have_arrow, "Pandas or Arrow not installed") +class GroupbyApplyTests(ReusedPySparkTestCase): +@classmethod +def setUpClass(cls): +ReusedPySparkTestCase.setUpClass() +cls.spark = SparkSession(cls.sc) + +@classmethod +def tearDownClass(cls): +ReusedPySparkTestCase.tearDownClass() +cls.spark.stop() + +def assertFramesEqual(self, expected, result): +msg = ("DataFrames are not equal: " + + ("\n\nExpected:\n%s\n%s" % (expected, expected.dtypes)) + + ("\n\nResult:\n%s\n%s" % (result, result.dtypes))) +self.assertTrue(expected.equals(result), msg=msg) + +@property +def data(self): +from pyspark.sql.functions import array, explode, col, lit +return self.spark.range(10).toDF('id') \ +.withColumn("vs", array([lit(i) for i in range(20, 30)])) \ +.withColumn("v", explode(col('vs'))).drop('vs') + +def test_simple(self): +from pyspark.sql.functions import pandas_udf +df = self.data + +foo_udf = pandas_udf( +lambda df: df.assign(v1=df.v * df.id * 1.0, v2=df.v + df.id), +StructType( +[StructField('id', LongType()), + StructField('v', IntegerType()), + StructField('v1', DoubleType()), + StructField('v2', LongType())])) + +result = df.groupby('id').apply(foo_udf).sort('id').toPandas() +expected = df.toPandas().groupby('id').apply(foo_udf.func).reset_index(drop=True) +self.assertFramesEqual(expected, result) + +def test_decorator(self): +from pyspark.sql.functions import pandas_udf +df = self.data + +@pandas_udf(StructType( +[StructField('id', LongType()), + StructField('v', IntegerType()), + StructField('v1', DoubleType()), + StructField('v2', LongType())])) +def foo(df): +return df.assign(v1=df.v * df.id * 1.0, v2=df.v + df.id) + +result = df.groupby('id').apply(foo).sort('id').toPandas() +expected = df.toPandas().groupby('id').apply(foo.func).reset_index(drop=True) +self.assertFramesEqual(expected, result) + +def test_coerce(self): +from pyspark.sql.functions import pandas_udf +df = self.data + +foo = pandas_udf( +lambda df: df, +StructType([StructField('id', LongType()), StructField('v', DoubleType())])) + +result = df.groupby('id').apply(foo).sort('id').toPandas() +expected = df.toPandas().groupby('id').apply(foo.func).reset_index(drop=True) +expected = expected.assign(v=expected.v.astype('float64')) +self.assertFramesEqual(expected, result) + +def test_complex_groupby(self): +from pyspark.sql.functions import pandas_udf, col +df = self.data + +@pandas_udf(StructType( +[StructField('id', LongType()), + StructField('v', IntegerType()), + StructField('norm', DoubleType())])) +def normalize(pdf): +v = pdf.v +return pdf.assign(norm=(v - v.mean()) / v.std()) + +result = df.groupby(col('id') % 2 == 0).apply(normalize).sort('id', 'v').toPandas() +pdf = df.toPandas() +expected = pdf.groupby(pdf['id'] % 2 == 0).apply(normalize.func) +expected = expected.sort_values(['id', 'v']).reset_index(drop=True) +expected = expected.assign(norm=expected.norm.astype('float64')) +self.assertFramesEqual(expected, result) + +def test_empty_groupby(self): +from pyspark.sql.functions import pandas_udf, col +df = self.data + +@pandas_udf(StructType( +[StructField('id', LongType()), + StructField('v', IntegerType()), +
[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143630813 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -519,3 +519,4 @@ case class CoGroup( outputObjAttr: Attribute, left: LogicalPlan, right: LogicalPlan) extends BinaryNode with ObjectProducer + --- End diff -- little nit: let's remove other changes here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143630505 --- Diff: python/pyspark/sql/tests.py --- @@ -3376,6 +3376,151 @@ def test_vectorized_udf_empty_partition(self): res = df.select(f(col('id'))) self.assertEquals(df.collect(), res.collect()) +def test_vectorized_udf_varargs(self): +from pyspark.sql.functions import pandas_udf, col +df = self.spark.createDataFrame(self.sc.parallelize([Row(id=1)], 2)) +f = pandas_udf(lambda *v: v[0], LongType()) +res = df.select(f(col('id'))) +self.assertEquals(df.collect(), res.collect()) + + +@unittest.skipIf(not _have_pandas or not _have_arrow, "Pandas or Arrow not installed") +class GroupbyApplyTests(ReusedPySparkTestCase): +@classmethod +def setUpClass(cls): +ReusedPySparkTestCase.setUpClass() +cls.spark = SparkSession(cls.sc) + +@classmethod +def tearDownClass(cls): +ReusedPySparkTestCase.tearDownClass() +cls.spark.stop() + +def assertFramesEqual(self, expected, result): +msg = ("DataFrames are not equal: " + + ("\n\nExpected:\n%s\n%s" % (expected, expected.dtypes)) + + ("\n\nResult:\n%s\n%s" % (result, result.dtypes))) +self.assertTrue(expected.equals(result), msg=msg) + +@property +def data(self): +from pyspark.sql.functions import array, explode, col, lit +return self.spark.range(10).toDF('id') \ +.withColumn("vs", array([lit(i) for i in range(20, 30)])) \ +.withColumn("v", explode(col('vs'))).drop('vs') + +def test_simple(self): +from pyspark.sql.functions import pandas_udf +df = self.data + +foo_udf = pandas_udf( +lambda df: df.assign(v1=df.v * df.id * 1.0, v2=df.v + df.id), +StructType( +[StructField('id', LongType()), + StructField('v', IntegerType()), + StructField('v1', DoubleType()), + StructField('v2', LongType())])) + +result = df.groupby('id').apply(foo_udf).sort('id').toPandas() +expected = df.toPandas().groupby('id').apply(foo_udf.func).reset_index(drop=True) +self.assertFramesEqual(expected, result) + +def test_decorator(self): +from pyspark.sql.functions import pandas_udf +df = self.data + +@pandas_udf(StructType( +[StructField('id', LongType()), + StructField('v', IntegerType()), + StructField('v1', DoubleType()), + StructField('v2', LongType())])) +def foo(df): +return df.assign(v1=df.v * df.id * 1.0, v2=df.v + df.id) + +result = df.groupby('id').apply(foo).sort('id').toPandas() +expected = df.toPandas().groupby('id').apply(foo.func).reset_index(drop=True) +self.assertFramesEqual(expected, result) + +def test_coerce(self): +from pyspark.sql.functions import pandas_udf +df = self.data + +foo = pandas_udf( +lambda df: df, --- End diff -- ditto: `df` -> `pdf` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143629848 --- Diff: python/pyspark/sql/functions.py --- @@ -2181,30 +2187,66 @@ def udf(f=None, returnType=StringType()): @since(2.3) def pandas_udf(f=None, returnType=StringType()): """ -Creates a :class:`Column` expression representing a user defined function (UDF) that accepts -`Pandas.Series` as input arguments and outputs a `Pandas.Series` of the same length. +Creates a vectorized user defined function (UDF). -:param f: python function if used as a standalone function +:param f: user-defined function. A python function if used as a standalone function :param returnType: a :class:`pyspark.sql.types.DataType` object ->>> from pyspark.sql.types import IntegerType, StringType ->>> slen = pandas_udf(lambda s: s.str.len(), IntegerType()) ->>> @pandas_udf(returnType=StringType()) -... def to_upper(s): -... return s.str.upper() -... ->>> @pandas_udf(returnType="integer") -... def add_one(x): -... return x + 1 -... ->>> df = spark.createDataFrame([(1, "John Doe", 21)], ("id", "name", "age")) ->>> df.select(slen("name").alias("slen(name)"), to_upper("name"), add_one("age")) \\ -... .show() # doctest: +SKIP -+--+--++ -|slen(name)|to_upper(name)|add_one(age)| -+--+--++ -| 8| JOHN DOE| 22| -+--+--++ +The user-defined function can define one of the following transformations: + +1. One or more `pandas.Series` -> A `pandas.Series` + + This udf is used with :meth:`pyspark.sql.DataFrame.withColumn` and + :meth:`pyspark.sql.DataFrame.select`. + The returnType should be a primitive data type, e.g., `DoubleType()`. + The length of the returned `pandas.Series` must be of the same as the input `pandas.Series`. + + >>> from pyspark.sql.types import IntegerType, StringType + >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType()) + >>> @pandas_udf(returnType=StringType()) + ... def to_upper(s): + ... return s.str.upper() + ... + >>> @pandas_udf(returnType="integer") + ... def add_one(x): + ... return x + 1 + ... + >>> df = spark.createDataFrame([(1, "John Doe", 21)], ("id", "name", "age")) + >>> df.select(slen("name").alias("slen(name)"), to_upper("name"), add_one("age")) \\ + ... .show() # doctest: +SKIP + +--+--++ + |slen(name)|to_upper(name)|add_one(age)| + +--+--++ + | 8| JOHN DOE| 22| + +--+--++ + +2. A `pandas.DataFrame` -> A `pandas.DataFrame` + + This udf is used with :meth:`pyspark.sql.GroupedData.apply`. --- End diff -- Maybe, `This udf is used with` -> `This udf is only used with` or .. probably we should add a `note` here. If I didn't know the context here, I'd wonder why it does not work as normal pandas udf .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143630469 --- Diff: python/pyspark/sql/tests.py --- @@ -3376,6 +3376,151 @@ def test_vectorized_udf_empty_partition(self): res = df.select(f(col('id'))) self.assertEquals(df.collect(), res.collect()) +def test_vectorized_udf_varargs(self): +from pyspark.sql.functions import pandas_udf, col +df = self.spark.createDataFrame(self.sc.parallelize([Row(id=1)], 2)) +f = pandas_udf(lambda *v: v[0], LongType()) +res = df.select(f(col('id'))) +self.assertEquals(df.collect(), res.collect()) + + +@unittest.skipIf(not _have_pandas or not _have_arrow, "Pandas or Arrow not installed") +class GroupbyApplyTests(ReusedPySparkTestCase): +@classmethod +def setUpClass(cls): +ReusedPySparkTestCase.setUpClass() +cls.spark = SparkSession(cls.sc) + +@classmethod +def tearDownClass(cls): +ReusedPySparkTestCase.tearDownClass() +cls.spark.stop() + +def assertFramesEqual(self, expected, result): +msg = ("DataFrames are not equal: " + + ("\n\nExpected:\n%s\n%s" % (expected, expected.dtypes)) + + ("\n\nResult:\n%s\n%s" % (result, result.dtypes))) +self.assertTrue(expected.equals(result), msg=msg) + +@property +def data(self): +from pyspark.sql.functions import array, explode, col, lit +return self.spark.range(10).toDF('id') \ +.withColumn("vs", array([lit(i) for i in range(20, 30)])) \ +.withColumn("v", explode(col('vs'))).drop('vs') + +def test_simple(self): +from pyspark.sql.functions import pandas_udf +df = self.data + +foo_udf = pandas_udf( +lambda df: df.assign(v1=df.v * df.id * 1.0, v2=df.v + df.id), +StructType( +[StructField('id', LongType()), + StructField('v', IntegerType()), + StructField('v1', DoubleType()), + StructField('v2', LongType())])) + +result = df.groupby('id').apply(foo_udf).sort('id').toPandas() +expected = df.toPandas().groupby('id').apply(foo_udf.func).reset_index(drop=True) +self.assertFramesEqual(expected, result) + +def test_decorator(self): +from pyspark.sql.functions import pandas_udf +df = self.data + +@pandas_udf(StructType( +[StructField('id', LongType()), + StructField('v', IntegerType()), + StructField('v1', DoubleType()), + StructField('v2', LongType())])) +def foo(df): +return df.assign(v1=df.v * df.id * 1.0, v2=df.v + df.id) --- End diff -- little nit: I'd call id `pdf` partly to avoid shadowing `df` and partly to say `pd.DataFrame`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143630939 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/pythonLogicalOperators.scala --- @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.plans.logical + +import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeSet, Expression} + +/** + * Logical nodes specific to PySpark. + */ --- End diff -- little nit: I'd remove this comment. I think the name already implies what this file contains. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19463: Cleanup comment in RDDSuite test
Github user sohum2002 commented on the issue: https://github.com/apache/spark/pull/19463 I just added "Removed one comment from RDDSuite." to the PR description. Will this suffice? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should not coll...
Github user akopich commented on the issue: https://github.com/apache/spark/pull/18924 @WeichenXu123, could you please notify @jkbradley once again? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18966: [SPARK-21751][SQL] CodeGeneraor.splitExpressions ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18966#discussion_r143629417 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -769,16 +769,21 @@ class CodegenContext { foldFunctions: Seq[String] => String = _.mkString("", ";\n", ";")): String = { val blocks = new ArrayBuffer[String]() val blockBuilder = new StringBuilder() +val maxLines = SQLConf.get.maxCodegenLinesPerFunction --- End diff -- @kiszk You know, I am just afraid new regression could be introduced due to this change. Sorry for the delay. I really do not have a better solution. I kind of agree on your original solution. Just exclude the characters for comment. At least, it becomes better and take a less risk to hit a regression. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18966: [SPARK-21751][SQL] CodeGeneraor.splitExpressions ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/18966#discussion_r143628760 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -769,16 +769,21 @@ class CodegenContext { foldFunctions: Seq[String] => String = _.mkString("", ";\n", ";")): String = { val blocks = new ArrayBuffer[String]() val blockBuilder = new StringBuilder() +val maxLines = SQLConf.get.maxCodegenLinesPerFunction --- End diff -- @gatorsmile Since to make it configurable [takes long time](https://github.com/apache/spark/pull/19449#discussion_r143385878), can we do it using hard-coded parameter? Even in this case, this PR makes better since the estimation does not include characters for comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19460 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19460 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19363: [SPARK-22224][Minor]Override toString of KeyValue/Relati...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19363 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19363: [SPARK-22224][Minor]Override toString of KeyValue/Relati...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19363 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82574/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19363: [SPARK-22224][Minor]Override toString of KeyValue/Relati...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19363 **[Test build #82574 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82574/testReport)** for PR 19363 at commit [`fe0d64a`](https://github.com/apache/spark/commit/fe0d64a1d5080d10fe6743f725107221acb9dd62). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19460 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19463: Cleanup comment in RDDSuite test
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19463 Could you please update the description why you want to apply this change? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19460 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18664 @BryanCutler, BTW, do you think it is possible to de-duplicate timezone handling within Python side if we go for 1.? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18664 I think I prefer 1. Do you maybe have a preference @ueshin? I believe you are more insightful in this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19399: [SPARK-22175][WEB-UI] Add status column to history page
Github user ajbozarth commented on the issue: https://github.com/apache/spark/pull/19399 With @jerryshao comments I'm going to get off the fence firmly against this, we already have too many things slowing down the SHS as it is --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/19082 Aha, I feel fair enough. Based the insight, there is one of solutions to make the wholestage codegen consider #calls of gen'd functions though, it seems the approach is not simple. So, splitting functions step-by-step is a preferred approach now... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18732 **[Test build #82577 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82577/testReport)** for PR 18732 at commit [`a064b21`](https://github.com/apache/spark/commit/a064b21b23d2c3dee9993c3b07d771fa8c09b8ba). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/18732 Merged some last minute changes from @BryanCutler to make the wrapping a bit cleaner. Thanks @BryanCutler! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...
Github user fjh100456 commented on a diff in the pull request: https://github.com/apache/spark/pull/19218#discussion_r143624224 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala --- @@ -68,6 +68,26 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand { .get("mapreduce.output.fileoutputformat.compress.type")) } +fileSinkConf.tableInfo.getOutputFileFormatClassName match { + case formatName if formatName.endsWith("ParquetOutputFormat") => +val compressionConf = "parquet.compression" +val compressionCodec = getCompressionByPriority(fileSinkConf, compressionConf, + sparkSession.sessionState.conf.parquetCompressionCodec) match { + case "NONE" => "UNCOMPRESSED" + case _@x => x +} +hadoopConf.set(compressionConf, compressionCodec) + case formatName if formatName.endsWith("OrcOutputFormat") => +val compressionConf = "orc.compress" +val compressionCodec = getCompressionByPriority(fileSinkConf, compressionConf, + sparkSession.sessionState.conf.orcCompressionCodec) match { + case "UNCOMPRESSED" => "NONE" --- End diff -- Yes, they are different, the style of parameter names and parameter values are all different, and should be parquet and orc problems. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...
Github user fjh100456 commented on a diff in the pull request: https://github.com/apache/spark/pull/19218#discussion_r143624210 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala --- @@ -68,6 +68,26 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand { .get("mapreduce.output.fileoutputformat.compress.type")) } +fileSinkConf.tableInfo.getOutputFileFormatClassName match { + case formatName if formatName.endsWith("ParquetOutputFormat") => +val compressionConf = "parquet.compression" +val compressionCodec = getCompressionByPriority(fileSinkConf, compressionConf, + sparkSession.sessionState.conf.parquetCompressionCodec) match { + case "NONE" => "UNCOMPRESSED" + case _@x => x +} +hadoopConf.set(compressionConf, compressionCodec) + case formatName if formatName.endsWith("OrcOutputFormat") => +val compressionConf = "orc.compress" +val compressionCodec = getCompressionByPriority(fileSinkConf, compressionConf, + sparkSession.sessionState.conf.orcCompressionCodec) match { + case "UNCOMPRESSED" => "NONE" + case _@x => x --- End diff -- In fact, the following process will check the correctness of this value, and because "orcoptions" is not accessable here, I have to add the "uncompressed" => "NONE" conversion. Do you have any good advice? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...
Github user fjh100456 commented on a diff in the pull request: https://github.com/apache/spark/pull/19218#discussion_r143624196 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala --- @@ -68,6 +68,26 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand { .get("mapreduce.output.fileoutputformat.compress.type")) } +fileSinkConf.tableInfo.getOutputFileFormatClassName match { + case formatName if formatName.endsWith("ParquetOutputFormat") => +val compressionConf = "parquet.compression" +val compressionCodec = getCompressionByPriority(fileSinkConf, compressionConf, + sparkSession.sessionState.conf.parquetCompressionCodec) match { --- End diff -- `compressionConf` will be used below, I've adjusted the format, thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...
Github user fjh100456 commented on a diff in the pull request: https://github.com/apache/spark/pull/19218#discussion_r143624181 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala --- @@ -68,6 +68,26 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand { .get("mapreduce.output.fileoutputformat.compress.type")) } +fileSinkConf.tableInfo.getOutputFileFormatClassName match { + case formatName if formatName.endsWith("ParquetOutputFormat") => --- End diff -- Sounds good idea. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18732 **[Test build #82576 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82576/testReport)** for PR 18732 at commit [`b0410a2`](https://github.com/apache/spark/commit/b0410a25f710029e93caf69d9037c843e63f0c41). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143622623 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -435,6 +435,35 @@ class RelationalGroupedDataset protected[sql]( df.logicalPlan.output, df.logicalPlan)) } + + /** + * Applies a vectorized python use-defined function to each group of data. --- End diff -- Thanks! Fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143622617 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala --- @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.python + +import scala.collection.JavaConverters._ + +import org.apache.spark.TaskContext +import org.apache.spark.api.python.{ChainedPythonFunctions, PythonEvalType} +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.physical.{AllTuples, ClusteredDistribution, Distribution, Partitioning} +import org.apache.spark.sql.execution.{GroupedIterator, SparkPlan, UnaryExecNode} +import org.apache.spark.sql.types.StructType + +/** + * Physical node for [[org.apache.spark.sql.catalyst.plans.logical.FlatMapGroupsInPandas]] + * + * Rows in each group are passed to the python worker as a Arrow record batch. + * The python worker turns the record batch to a pandas.DataFrame, invoke the + * use-defined function, and passes the resulting pandas.DataFrame --- End diff -- Thanks! Fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19463: Cleanup comment in RDDSuite test
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19463 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19463: Cleanup comment in RDDSuite test
GitHub user sohum2002 opened a pull request: https://github.com/apache/spark/pull/19463 Cleanup comment in RDDSuite test ## What changes were proposed in this pull request? There were not changes proposed in this pull request. ## How was this patch tested? There were not tests in this pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sohum2002/spark cleanup-RDDSuite-test Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19463.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19463 commit c83ab1e5c51311ecb293e47e9c9694a9a49cfbaa Author: Sachathamakul, Patrachai (Agoda)Date: 2017-10-10T03:14:27Z Cleanup comment in RDDSuite test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19459 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82573/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19459 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19459 **[Test build #82573 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82573/testReport)** for PR 19459 at commit [`9d667c6`](https://github.com/apache/spark/commit/9d667c6fcb7e47169a2e48ec130fbdbb42a21f41). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19454: [SPARK-22152][SPARK-18855][SQL] Added flatten fun...
Github user sohum2002 closed the pull request at: https://github.com/apache/spark/pull/19454 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19454: [SPARK-22152][SPARK-18855][SQL] Added flatten functions ...
Github user sohum2002 commented on the issue: https://github.com/apache/spark/pull/19454 Thank you all for your comments. I hope to improve in my future PRs. Cheers! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19454: [SPARK-22152][SPARK-18855][SQL] Added flatten functions ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19454 Honestly I don't think it is worth doing this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19462: [SPARK-22159][SQL][FOLLOW-UP] Make config names consiste...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19462 cc @rxin @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19462: [SPARK-22159][SQL][FOLLOW-UP] Make config names consiste...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19462 **[Test build #82575 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82575/testReport)** for PR 19462 at commit [`5bef05e`](https://github.com/apache/spark/commit/5bef05e3d84805866103766f6287ecb054dcad68). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19462: [SPARK-22159][SQL][FOLLOW-UP] Make config names c...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/19462 [SPARK-22159][SQL][FOLLOW-UP] Make config names consistently end with "enabled". ## What changes were proposed in this pull request? This is a follow-up of #19384. In the previous pr, only definitions of the config names were modified, but we also need to modify the names in runtime or tests specified as string literal. ## How was this patch tested? Existing tests but modified the config names. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-22159/fup1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19462.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19462 commit 5bef05e3d84805866103766f6287ecb054dcad68 Author: Takuya UESHINDate: 2017-10-10T02:23:47Z Fix config names. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19082 The above reasoning also explains the motivation and the effect of #18931 too. The generated codes of query operators are extracted to individual smaller functions. It is beneficial to step in by JIT. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19082 @maropu The codes to do aggregation are actually wrapped in a function `doAggregateWithKeys`/`doAggregateWithoutKey`. This is also the part of generated codes this PR improves by extracting functions. My initial thought is, during the processing of the query, this function `doAggregateWithKeys`/`doAggregateWithoutKey` actually only runs once to aggregate on all rows. No matter it is a long function or not, we don't have chance for JIT to step in. That said the length of this function doesn't impact too much in JIT issue. The long function issue affects the performance of wholestage codegen, because it is run many times in non-compiled way. It drags the speed of other generated codes. Since `doAggregateWithKeys`/`doAggregateWithoutKey` only run once, it doesn't impact much. So wholestage codegen query is still faster than non-wholestage codegen one. This PR improves the aggregation because it extracts small functions from `doAggregateWithKeys`/`doAggregateWithoutKey`. Those functions will be run many times in the wrapping function. So JIT has room to step in now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19460 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82572/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19460 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19460 **[Test build #82572 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82572/testReport)** for PR 19460 at commit [`0f82f2d`](https://github.com/apache/spark/commit/0f82f2d5d49fc64e7a8ac4714900417a55ba72d1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19363: [SPARK-22224][Minor]Override toString of KeyValue/Relati...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19363 **[Test build #82574 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82574/testReport)** for PR 19363 at commit [`fe0d64a`](https://github.com/apache/spark/commit/fe0d64a1d5080d10fe6743f725107221acb9dd62). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143614190 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -435,6 +435,35 @@ class RelationalGroupedDataset protected[sql]( df.logicalPlan.output, df.logicalPlan)) } + + /** + * Applies a vectorized python use-defined function to each group of data. --- End diff -- nit: `use-defined` -> `user-defined` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143614283 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala --- @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.python + +import scala.collection.JavaConverters._ + +import org.apache.spark.TaskContext +import org.apache.spark.api.python.{ChainedPythonFunctions, PythonEvalType} +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.physical.{AllTuples, ClusteredDistribution, Distribution, Partitioning} +import org.apache.spark.sql.execution.{GroupedIterator, SparkPlan, UnaryExecNode} +import org.apache.spark.sql.types.StructType + +/** + * Physical node for [[org.apache.spark.sql.catalyst.plans.logical.FlatMapGroupsInPandas]] + * + * Rows in each group are passed to the python worker as a Arrow record batch. + * The python worker turns the record batch to a pandas.DataFrame, invoke the + * use-defined function, and passes the resulting pandas.DataFrame --- End diff -- nit: `use-defined` -> `user-defined` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19399: [SPARK-22175][WEB-UI] Add status column to history page
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19399 I agree with @squito that the criteria to define application's success should be well considered. Here in your current code, only if all the jobs are successful then the application is marked as successful, is it too strict that we cannot allow any failure and retry? Besides, if an application is successfully running all the Spark jobs, but fail on their own code (eg, saving to DB), and the application is exited with non-zero code, shall we mark the application succeed or failure? Also the structure to track all the jobs `jobToStatus ` will increase the memory occupation indefinitely in long running application. Besides with your changes I can see that page loading time will be increased, for those applications which have many jobs (like Spark Streaming) the problem will be severe. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/19082 Either way, I think we first need to know why the regression on `q66` happens when turning off wholestage codegen. We first thought turning off too-long functions had better performance, but it is not always true. Also, we better know if this regression happens on other JVM impls, too. Next, I'll look into these issues. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19454: [SPARK-22152][SPARK-18855][SQL] Added flatten fun...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19454#discussion_r143612478 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2543,6 +2543,14 @@ class Dataset[T] private[sql]( mapPartitions(_.flatMap(func)) /** +* Returns a new Dataset by by flattening a traversable collection into a collection itself. +* --- End diff -- (and `by by` -> by` I guess) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r143610100 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala --- @@ -203,4 +205,16 @@ private[sql] object ArrowConverters { reader.close() } } + + def toDataFrame( --- End diff -- Yup, I think we should put it there. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19454: [SPARK-22152][SPARK-18855][SQL] Added flatten fun...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19454#discussion_r143608933 --- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala --- @@ -63,6 +63,7 @@ class RDDSuite extends SparkFunSuite with SharedSparkContext { assert(nums.map(_.toString).collect().toList === List("1", "2", "3", "4")) assert(nums.filter(_ > 2).collect().toList === List(3, 4)) assert(nums.flatMap(x => 1 to x).collect().toList === List(1, 1, 2, 1, 2, 3, 1, 2, 3, 4)) +assert(sc.makeRDD(Array(Array(1,2,3,4), Array(1,2,3,4))).flatten == List(1,2,3,4,1,2,3,4)) --- End diff -- `.flatten.collect().toList`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19454: [SPARK-22152][SPARK-18855][SQL] Added flatten fun...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19454#discussion_r143607680 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -382,6 +382,13 @@ abstract class RDD[T: ClassTag]( } /** +* Return a new RDD by flattening a traversable collection into a collection itself. +*/ --- End diff -- Please follow existing comment style like line 392. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19459 **[Test build #82573 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82573/testReport)** for PR 19459 at commit [`9d667c6`](https://github.com/apache/spark/commit/9d667c6fcb7e47169a2e48ec130fbdbb42a21f41). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r143607522 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala --- @@ -203,4 +205,16 @@ private[sql] object ArrowConverters { reader.close() } } + + def toDataFrame( --- End diff -- I had to make this public to be callable with py4j. Alternatively, something could be added to `o.a.s.sql.api.python.PythonSQLUtils`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r143606693 --- Diff: python/pyspark/sql/tests.py --- @@ -3147,6 +3150,14 @@ def test_filtered_frame(self): self.assertEqual(pdf.columns[0], "i") self.assertTrue(pdf.empty) +def test_createDataFrame_toggle(self): +pdf = self.createPandasDataFrameFromeData() +self.spark.conf.set("spark.sql.execution.arrow.enable", "false") +df_no_arrow = self.spark.createDataFrame(pdf) +self.spark.conf.set("spark.sql.execution.arrow.enable", "true") --- End diff -- Hmmm, I thought the `tearDownClass` was there but it's actually in #18664. Maybe I should put it in here since that needs some more discussion. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19454: [SPARK-22152][SPARK-18855][SQL] Added flatten fun...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19454#discussion_r143606572 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2543,6 +2543,14 @@ class Dataset[T] private[sql]( mapPartitions(_.flatMap(func)) /** +* Returns a new Dataset by by flattening a traversable collection into a collection itself. +* --- End diff -- @group typedrel? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19460 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82567/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19460 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r143605840 --- Diff: python/pyspark/sql/tests.py --- @@ -3147,6 +3150,14 @@ def test_filtered_frame(self): self.assertEqual(pdf.columns[0], "i") self.assertTrue(pdf.empty) +def test_createDataFrame_toggle(self): +pdf = self.createPandasDataFrameFromeData() +self.spark.conf.set("spark.sql.execution.arrow.enable", "false") +df_no_arrow = self.spark.createDataFrame(pdf) +self.spark.conf.set("spark.sql.execution.arrow.enable", "true") --- End diff -- done. I guess this would make the failure easier to see? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19460 **[Test build #82567 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82567/testReport)** for PR 19460 at commit [`90ecbcc`](https://github.com/apache/spark/commit/90ecbcc4f6909d7243a69014d5f76753fb451452). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19442: [SPARK-8515][ML][WIP] Improve ML Attribute API
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19442 @VDuda Thanks for asking. This is a big change. I hope this PR can resolve SPARK-8515. Most APIs are ready. I'm working on the compatibility with current attribute APIs. When it is ready, I'll re-open this for review. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19460 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82564/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19460 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19460 **[Test build #82564 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82564/testReport)** for PR 19460 at commit [`09a1d5f`](https://github.com/apache/spark/commit/09a1d5fb689a979e5a48de7f90dfbc1f066bea86). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19461: [SPARK-22230] Swap per-row order in state store restore.
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/19461 Discussed offline. We don't need to backport to branch-2.2. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/18664 Ok sounds good. Could I get some opinions on the best way to convert internal Spark timestamps since they are stored as UTC time? I think we have the following options: 1. Write Arrow data with SESSION_LOCAL timestamp (as is currently in this PR), then convert to local timezone without timestamp in Python after the data is loaded into Pandas. This would be at the end of `toPandas()` or just before the user function is called in `pandas_udf`s, and convert back to UTC again just after. 2. Convert Spark internal data to local timezone without timestamp in Scala and write to Arrow data as timezone naive. With (1) it's easy to do the conversion with Pandas, but we have to make sure it gets done at multiple places. With (2), it's just in one spot but I'm not sure if it's possible to end up doing the conversion more than once --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19269: [SPARK-22026][SQL][WIP] data source v2 write path
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19269 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19269: [SPARK-22026][SQL][WIP] data source v2 write path
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19269 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82571/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19269: [SPARK-22026][SQL][WIP] data source v2 write path
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19269 **[Test build #82571 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82571/testReport)** for PR 19269 at commit [`2d41e44`](https://github.com/apache/spark/commit/2d41e44a1ae4067e55d19cf0425a8eb2e7d97b2a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class WriteToDataSourceV2Command(writer: DataSourceV2Writer, query: LogicalPlan)` * `class RowToInternalRowDataWriteFactory(rowWriterFactory: DataWriteFactory[Row], schema: StructType)` * `class RowToInternalRowDataWriter(rowWriter: DataWriter[Row], encoder: ExpressionEncoder[Row])` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19270: [SPARK-21809] : Change Stage Page to use datatables to s...
Github user ajbozarth commented on the issue: https://github.com/apache/spark/pull/19270 So I think I know why the appId was handled the way it was, the live app ui no longer works because the appId var is "undefined" in all the api calls --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19270: [SPARK-21809] : Change Stage Page to use datatables to s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19270 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19270: [SPARK-21809] : Change Stage Page to use datatables to s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19270 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82563/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19270: [SPARK-21809] : Change Stage Page to use datatables to s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19270 **[Test build #82563 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82563/testReport)** for PR 19270 at commit [`0b2a8cf`](https://github.com/apache/spark/commit/0b2a8cfaab8fa6bcb92176f74dce2f47ba65454d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19461: [SPARK-22230] Swap per-row order in state store restore.
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/19461 Oh, there are some conflicts with 2.2. @joseph-torres could you submit a backport PR, please? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19461: [SPARK-22230] Swap per-row order in state store r...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19461 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19461: [SPARK-22230] Swap per-row order in state store restore.
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/19461 Thanks! Merging to master and 2.2. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19461: [SPARK-22230] Swap per-row order in state store restore.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19461 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19461: [SPARK-22230] Swap per-row order in state store restore.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19461 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82566/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19461: [SPARK-22230] Swap per-row order in state store restore.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19461 **[Test build #82566 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82566/testReport)** for PR 19461 at commit [`17ef8a8`](https://github.com/apache/spark/commit/17ef8a843e7dec8da0625caeda213cb1f5c64a4a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r143600411 --- Diff: python/pyspark/sql/tests.py --- @@ -3147,6 +3150,14 @@ def test_filtered_frame(self): self.assertEqual(pdf.columns[0], "i") self.assertTrue(pdf.empty) +def test_createDataFrame_toggle(self): +pdf = self.createPandasDataFrameFromeData() +self.spark.conf.set("spark.sql.execution.arrow.enable", "false") +df_no_arrow = self.spark.createDataFrame(pdf) +self.spark.conf.set("spark.sql.execution.arrow.enable", "true") --- End diff -- I'd set this to `true` in `finally` just in case the test failed in `df_no_arrow = self.spark.createDataFrame(pdf)` and `spark.sql.execution.arrow.enable` reminds `false` affecting other test cases if I didn't miss something. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19433: [SPARK-3162] [MLlib] Add local tree training for decisio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19433 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82570/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19433: [SPARK-3162] [MLlib] Add local tree training for decisio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19433 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19433: [SPARK-3162] [MLlib] Add local tree training for decisio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19433 **[Test build #82570 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82570/testReport)** for PR 19433 at commit [`abc86b2`](https://github.com/apache/spark/commit/abc86b2042e0fd42cc0e9fe20cf79967b16e9779). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19460 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19460 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82562/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19460 **[Test build #82562 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82562/testReport)** for PR 19460 at commit [`92a6d2d`](https://github.com/apache/spark/commit/92a6d2d53aea02042d47888e99df5a4f2167cd1f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18664 Yup, I think we already don't have timezone in `udf` too? I think we are fine as long as it keeps the existing behaviour. Let's don't forget to handle all those cases when we deal with timezone in a separate PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19250: [SPARK-12297] Table timezone correction for Timestamps
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19250 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82561/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19250: [SPARK-12297] Table timezone correction for Timestamps
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19250 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org