[GitHub] spark issue #22194: [SPARK-23932][SQL][FOLLOW-UP] Fix an example of zip_with...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/22194 @ueshin LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22031: [SPARK-23932][SQL] Higher order function zip_with
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/22031#discussion_r210452329 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -442,3 +442,91 @@ case class ArrayAggregate( override def prettyName: String = "aggregate" } + +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(left, right, func) - Merges the two given arrays, element-wise, into a single array using function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying function.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array('a', 'b', 'c'), (x, y) -> (y, x)); + array(('a', 1), ('b', 3), ('c', 5)) + > SELECT _FUNC_(array(1, 2), array(3, 4), (x, y) -> x + y)); + array(4, 6) + > SELECT _FUNC_(array('a', 'b', 'c'), array('d', 'e', 'f'), (x, y) -> concat(x, y)); + array('ad', 'be', 'cf') + """, + since = "2.4.0") +// scalastyle:on line.size.limit +case class ArraysZipWith( +left: Expression, +right: Expression, +function: Expression) + extends HigherOrderFunction with CodegenFallback with ExpectsInputTypes { + + override def inputs: Seq[Expression] = List(left, right) + + override def functions: Seq[Expression] = List(function) + + def expectingFunctionType: AbstractDataType = AnyDataType + @transient lazy val functionForEval: Expression = functionsForEval.head + + override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType, ArrayType, expectingFunctionType) + + override def nullable: Boolean = inputs.exists(_.nullable) + + override def dataType: ArrayType = ArrayType(function.dataType, function.nullable) + + override def bind(f: (Expression, Seq[(DataType, Boolean)]) => LambdaFunction): ArraysZipWith = { +val (leftElementType, leftContainsNull) = left.dataType match { + case ArrayType(elementType, containsNull) => (elementType, containsNull) + case _ => +val ArrayType(elementType, containsNull) = ArrayType.defaultConcreteType +(elementType, containsNull) +} +val (rightElementType, rightContainsNull) = right.dataType match { + case ArrayType(elementType, containsNull) => (elementType, containsNull) + case _ => +val ArrayType(elementType, containsNull) = ArrayType.defaultConcreteType +(elementType, containsNull) +} +copy(function = f(function, + (leftElementType, leftContainsNull) :: (rightElementType, rightContainsNull) :: Nil)) --- End diff -- @mn-mikke @ueshin "both arrays must be the same length" was how zip_with in presto used to work, they've moved to appending nulls and process regardless. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22031: [TODO][SPARK-23932][SQL] Higher order function zip_with
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/22031 Hi @ueshin I will update the PR tommorow --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22031: [TODO][SPARK-23932][SQL] Higher order function zi...
GitHub user techaddict opened a pull request: https://github.com/apache/spark/pull/22031 [TODO][SPARK-23932][SQL] Higher order function zip_with ## What changes were proposed in this pull request? Merges the two given arrays, element-wise, into a single array using function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying function: ``` SELECT zip_with(ARRAY[1, 3, 5], ARRAY['a', 'b', 'c'], (x, y) -> (y, x)); -- [ROW('a', 1), ROW('b', 3), ROW('c', 5)] SELECT zip_with(ARRAY[1, 2], ARRAY[3, 4], (x, y) -> x + y); -- [4, 6] SELECT zip_with(ARRAY['a', 'b', 'c'], ARRAY['d', 'e', 'f'], (x, y) -> concat(x, y)); -- ['ad', 'be', 'cf'] SELECT zip_with(ARRAY['a'], ARRAY['d', null, 'f'], (x, y) -> coalesce(x, y)); -- ['a', null, 'f'] ``` ## How was this patch tested? Added tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/techaddict/spark SPARK-23932 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22031.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22031 commit 03d19cee425be90a61b60163ff9d6740716d45a6 Author: Sandeep Singh Date: 2018-08-03T04:15:00Z . commit 6f91777de93121d668ff11e7701f449bb4c96337 Author: Sandeep Singh Date: 2018-08-04T22:00:38Z fix description --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/14036 @HyukjinKwon didn't have bandwidth will try to finish this weekend --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15831 @HyukjinKwon was busy, will restart this week. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15831 @sethah I will revive this pr thanks ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15831 @MLnick I will create a umbrella jira and start adding jira's for things I'm aware of of and you can start prioritising ð sounds like a plan ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15831 @sethah @yanboliang I've started with migrating `IDF`, can you review the WIP and if i'm going in the right direction https://github.com/techaddict/spark/pull/2/files there is some code duplication were we can make mllib code actually depend on the ml one --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16101: [WIP] Migrate IDF to not used mllib
GitHub user techaddict opened a pull request: https://github.com/apache/spark/pull/16101 [WIP] Migrate IDF to not used mllib You can merge this pull request into a Git repository by running: $ git pull https://github.com/techaddict/spark migrate-idf Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16101.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16101 commit fa220a6b21dd36591a44dfb7d32494fee7c60b08 Author: Sandeep Singh Date: 2016-12-01T14:59:36Z add transform commit cb869eb71392f47b9e63af3ba6aeaa031523baaf Author: Sandeep Singh Date: 2016-12-01T15:02:30Z make IDFModel work commit d1bb36d3c93e99214aeaec34bffdd63c82724f89 Author: Sandeep Singh Date: 2016-12-01T15:02:59Z since tag commit 89546ec4e5248d71db39b519cf6a6d072b767bd1 Author: Sandeep Singh Date: 2016-12-01T15:15:37Z works commit 72f8c7d59da2224bd71b0d56e1f2c388e277f9df Author: Sandeep Singh Date: 2016-12-01T15:22:27Z works commit 5cb2c3e4df4807941647e72cec1f41ce4f02018b Author: Sandeep Singh Date: 2016-12-01T15:32:00Z migrate everything to ml --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16101: [WIP] Migrate IDF to not used mllib
Github user techaddict closed the pull request at: https://github.com/apache/spark/pull/16101 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15843 @jkbradley @holdenk @viirya PR updated --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15843 @jkbradley @holdenk will update the PR with changes today. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15817: [SPARK-18366][PYSPARK][ML] Add handleInvalid to Pyspark ...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15817 ping @davies @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15831 @sethah I agree, 2nd approach is much more reasonable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15817: [SPARK-18366][PYSPARK][ML] Add handleInvalid to Pyspark ...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15817 @jkbradley done ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark String...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15843 @holdenk updated the description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15817: [SPARK-18366][PYSPARK] Add handleInvalid to Pyspa...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/15817#discussion_r87621123 --- Diff: python/pyspark/ml/feature.py --- @@ -1163,9 +1184,11 @@ class QuantileDiscretizer(JavaEstimator, HasInputCol, HasOutputCol, JavaMLReadab >>> df = spark.createDataFrame([(0.1,), (0.4,), (1.2,), (1.5,)], ["values"]) >>> qds = QuantileDiscretizer(numBuckets=2, -... inputCol="values", outputCol="buckets", relativeError=0.01) +... inputCol="values", outputCol="buckets", relativeError=0.01, handleInvalid="error") >>> qds.getRelativeError() 0.01 +>>> qds.getHandleInvalid() --- End diff -- good idea! adding --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15817: [SPARK-18366][PYSPARK] Add handleInvalid to Pyspark for ...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15817 @MLnick thanks for the review, addressed your comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15817: [SPARK-18366][PYSPARK] Add handleInvalid to Pyspa...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/15817#discussion_r87593705 --- Diff: python/pyspark/ml/feature.py --- @@ -1194,21 +1217,30 @@ class QuantileDiscretizer(JavaEstimator, HasInputCol, HasOutputCol, JavaMLReadab "Must be in the range [0, 1].", typeConverter=TypeConverters.toFloat) +handleInvalid = Param(Params._dummy(), "handleInvalid", "how to handle invalid entries. " + + "Options are skip (filter out rows with invalid values), " + + "error (throw an error), or keep (keep invalid values in a special " + + "additional bucket).", + typeConverter=TypeConverters.toString) + @keyword_only -def __init__(self, numBuckets=2, inputCol=None, outputCol=None, relativeError=0.001): +def __init__(self, numBuckets=2, inputCol=None, outputCol=None, relativeError=0.001, + handleInvalid="error"): """ -__init__(self, numBuckets=2, inputCol=None, outputCol=None, relativeError=0.001) +__init__(self, numBuckets=2, inputCol=None, outputCol=None, relativeError=0.001, +handleInvalid="error") """ super(QuantileDiscretizer, self).__init__() self._java_obj = self._new_java_obj("org.apache.spark.ml.feature.QuantileDiscretizer", self.uid) -self._setDefault(numBuckets=2, relativeError=0.001) +self._setDefault(numBuckets=2, relativeError=0.001, handleInvalid="error") kwargs = self.__init__._input_kwargs self.setParams(**kwargs) @keyword_only @since("2.0.0") -def setParams(self, numBuckets=2, inputCol=None, outputCol=None, relativeError=0.001): +def setParams(self, numBuckets=2, inputCol=None, outputCol=None, relativeError=0.001, + handleInvalid="error"): --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15817: [SPARK-18366][PYSPARK] Add handleInvalid to Pyspa...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/15817#discussion_r87593693 --- Diff: python/pyspark/ml/feature.py --- @@ -158,19 +158,26 @@ class Bucketizer(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadable, Jav "splits specified will be treated as errors.", typeConverter=TypeConverters.toListFloat) +handleInvalid = Param(Params._dummy(), "handleInvalid", "how to handle invalid entries. " + + "Options are skip (filter out rows with invalid values), " + + "error (throw an error), or keep (keep invalid values in a special " + + "additional bucket).", + typeConverter=TypeConverters.toString) + @keyword_only -def __init__(self, splits=None, inputCol=None, outputCol=None): +def __init__(self, splits=None, inputCol=None, outputCol=None, handleInvalid="error"): """ -__init__(self, splits=None, inputCol=None, outputCol=None) +__init__(self, splits=None, inputCol=None, outputCol=None, handleInvalid="error") """ super(Bucketizer, self).__init__() self._java_obj = self._new_java_obj("org.apache.spark.ml.feature.Bucketizer", self.uid) +self._setDefault(handleInvalid="error") kwargs = self.__init__._input_kwargs self.setParams(**kwargs) @keyword_only @since("1.4.0") -def setParams(self, splits=None, inputCol=None, outputCol=None): +def setParams(self, splits=None, inputCol=None, outputCol=None, handleInvalid="error"): --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/15843#discussion_r87550799 --- Diff: python/pyspark/ml/wrapper.py --- @@ -33,6 +33,10 @@ def __init__(self, java_obj=None): super(JavaWrapper, self).__init__() self._java_obj = java_obj +def __del__(self): +if SparkContext._active_spark_context: --- End diff -- checking if there is active spark context, got this error after `quit()` in `pyspark` ``` Exception ignored in: Traceback (most recent call last): File "/Users/xx/Project/Spark/python/pyspark/ml/wrapper.py", line 37, in __del__ SparkContext._active_spark_context._gateway.detach(self._java_obj) AttributeError: 'NoneType' object has no attribute '_gateway' Exception ignored in: Traceback (most recent call last): File "/Users/xx/Project/Spark/python/pyspark/ml/wrapper.py", line 37, in __del__ AttributeError: 'NoneType' object has no attribute '_gateway' ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark String...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15843 @jkbradley looks good, merged ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark String...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15843 @jkbradley yes I did it for `JavaWrapper` first, but try running tests with it gives https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68478/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark String...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15843 cc: @jkbradley @davies @holdenk --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15817: [SPARK-18366][PYSPARK] Add handleInvalid to Pyspark for ...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15817 cc: @sethah @marmbrus --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15843: [SPARK-18274] Memory leak in PySpark StringIndexe...
GitHub user techaddict opened a pull request: https://github.com/apache/spark/pull/15843 [SPARK-18274] Memory leak in PySpark StringIndexer ## What changes were proposed in this pull request? Make Java Gateway dereference object in destructor, using `SparkContext._gateway.detach` inside`JavaWrapper`'s destructor ## How was this patch tested? ```scala import random, string from pyspark.ml.feature import StringIndexer l = [(''.join(random.choice(string.ascii_uppercase) for _ in range(10)), ) for _ in range(int(7e5))] # 70 random strings of 10 characters df = spark.createDataFrame(l, ['string']) for i in range(50): indexer = StringIndexer(inputCol='string', outputCol='index') indexer.fit(df) ``` Before: would keep StringIndexer strong reference, causing GC issues and is halted midway After: garbage collection works as the object is dereferenced, and computation completes Testing using profiler You can merge this pull request into a Git repository by running: $ git pull https://github.com/techaddict/spark SPARK-18274 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15843.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15843 commit a493c1961829000986446db11ce67f3103a79bea Author: Sandeep Singh Date: 2016-11-10T16:16:13Z [SPARK-18274] Memory leak in PySpark StringIndexer --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15831 cc: @dbtsai @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15831: [SPARK-18385][ML] Make the transformer's natively...
GitHub user techaddict opened a pull request: https://github.com/apache/spark/pull/15831 [SPARK-18385][ML] Make the transformer's natively in ml framework to avoid extra conversion ## What changes were proposed in this pull request? Transformer's added in ml framework to avoid extra conversion for: ChiSqSelector IDF StandardScaler PCA ## How was this patch tested? Existing Tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/techaddict/spark ml-transformer Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15831.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15831 commit da3626168ce264719517a8d34afdc500991fb700 Author: Sandeep Singh Date: 2016-11-09T14:53:14Z ChiSqSelector: make the transformer natively in ml framework to avoid extra conversion commit 733394fb3d7f4ea6891a4f6b0e41a03c9a1abc38 Author: Sandeep Singh Date: 2016-11-09T15:40:24Z add transformer for IDF commit da437316879a6e2cb9df9549e28ea9b1b95b63d5 Author: Sandeep Singh Date: 2016-11-09T15:55:22Z add StandardScaler transform commit a9483ef41423f2dfdc3bfb747a3bcf99ea1db50b Author: Sandeep Singh Date: 2016-11-09T16:03:01Z add PCA transform --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15817: [SPARK-18366][PYSPARK] Add handleInvalid to Pyspa...
GitHub user techaddict opened a pull request: https://github.com/apache/spark/pull/15817 [SPARK-18366][PYSPARK] Add handleInvalid to Pyspark for QuantileDiscretizer and Bucketizer ## What changes were proposed in this pull request? added the new handleInvalid param for these transformers to Python to maintain API parity. ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/techaddict/spark SPARK-18366 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15817.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15817 commit 0e41b36493fcb5eee5f342f694b0d2bc2a1e6c41 Author: Sandeep Singh Date: 2016-11-09T00:03:21Z add handleInvalid to QuantileDiscretizer commit 3b5133cac34dc42db71fabbf12c0a8e44d0fb2ba Author: Sandeep Singh Date: 2016-11-09T00:09:13Z fix lint issues commit 20bfd9b3e1028e54619a992a4b333b4fe8c694bc Author: Sandeep Singh Date: 2016-11-09T00:15:10Z handleInvalid to Bucketizer commit 19224724350a6d6c1936b496784131309ce286b0 Author: Sandeep Singh Date: 2016-11-09T00:15:52Z fix lint error commit b4720aa49eb94092aa255dcaa47f3e52b44cd6d2 Author: Sandeep Singh Date: 2016-11-09T00:21:04Z Merge branch 'master' into SPARK-18366 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15809: [SPARK-18268] ALS.run fail with better message if rating...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15809 @srowen done ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15809: [SPARK-18268] ALS.run fail with better message if...
GitHub user techaddict opened a pull request: https://github.com/apache/spark/pull/15809 [SPARK-18268] ALS.run fail with better message if ratings is empty rdd ## What changes were proposed in this pull request? ALS.run fail with better message if ratings is empty rdd ALS.train and ALS.trainImplicit are also affected ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/techaddict/spark SPARK-18268 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15809.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15809 commit 8080583922dc8f274d559ec3ea985d1bc9d171b9 Author: Sandeep Singh Date: 2016-11-08T15:22:49Z [SPARK-18268] ALS.run fail with better message if ratings is empty rdd ALS.train and ALS.trainImplicit are also affected --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15654: [SPARK-16881][MESOS] Migrate Mesos configs to use Config...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15654 @mgummelt yes working on it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15654: [SPARK-16881][MESOS] Migrate Mesos configs to use Config...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15654 @mgummelt done! ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15654: [SPARK-16881][MESOS] Migrate Mesos configs to use Config...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15654 cc: @mgummelt @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15654: [SPARK-16881][MESOS] Migrate Mesos configs to use...
GitHub user techaddict opened a pull request: https://github.com/apache/spark/pull/15654 [SPARK-16881][MESOS] Migrate Mesos configs to use ConfigEntry ## What changes were proposed in this pull request? Migrate Mesos configs to use ConfigEntry ## How was this patch tested? Jenkins Tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/techaddict/spark SPARK-16881 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15654.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15654 commit 55ff640abd8703826590bde7d0d4f7604272142e Author: Sandeep Singh Date: 2016-10-27T02:59:16Z [SPARK-16881] Migrate Mesos configs to use ConfigEntry commit af306bd3c2d182d890fd769dffb190da2c2620ab Author: Sandeep Singh Date: 2016-10-27T02:59:57Z Merge branch 'master' into SPARK-16881 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15433: [SPARK-17822][SPARKR] Use weak reference in JVMOb...
Github user techaddict closed the pull request at: https://github.com/apache/spark/pull/15433 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15433: [SPARK-17822][SPARKR] Use weak reference in JVMObjectTra...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15433 closing this since, its maybe not the right way to do this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12913: [SPARK-928][CORE] Add support for Unsafe-based serialize...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/12913 @rxin can you review again, all comments addressed ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12913: [SPARK-928][CORE] Add support for Unsafe-based se...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/12913#discussion_r84570678 --- Diff: core/src/test/scala/org/apache/spark/serializer/UnsafeKryoSerializerSuite.scala --- @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.serializer + +class UnsafeKryoSerializerSuite extends KryoSerializerSuite { + + // This test suite should run all tests in KryoSerializerSuite with kryo unsafe. + + override def beforeAll() { +super.beforeAll() +conf.set("spark.kryo.unsafe", "true") --- End diff -- Ohh yes, fixed and tested ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12913: [SPARK-928][CORE] Add support for Unsafe-based serialize...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/12913 @mateiz updated ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12913: [SPARK-928][CORE] Add support for Unsafe-based serialize...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/12913 @mateiz updated the pr ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15433: [SPARK-17822][SPARKR] Use weak reference in JVMObjectTra...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15433 @shivaram @srowen not sure why its failing, will try to fix this ASAP. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15433: [SPARK-17822] Use weak reference in JVMObjectTrac...
GitHub user techaddict reopened a pull request: https://github.com/apache/spark/pull/15433 [SPARK-17822] Use weak reference in JVMObjectTracker.objMap because it may leak JVM objects ## What changes were proposed in this pull request? Use weak reference in JVMObjectTracker.objMap because it may leak JVM objects ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/techaddict/spark SPARK-17822 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15433.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15433 commit 7023c40a99eaa81ee7bcd202a4b74df811d0cfc7 Author: Sandeep Singh Date: 2016-10-10T11:54:34Z [SPARK-17822] Use weak reference in JVMObjectTracker.objMap because it may leak JVM objects commit 69845947df62187eb40f3cc6468b52e38bdab897 Author: Sandeep Singh Date: 2016-10-10T13:23:56Z Merge branch 'master' into SPARK-17822 commit 995611d75351d24907ce2b22e7d33752cc803da3 Author: Sandeep Singh Date: 2016-10-11T13:13:09Z Merge branch 'master' into SPARK-17822 commit 8e763bef78fe147e84e1771f237a75ff42780705 Author: Sandeep Singh Date: 2016-10-12T06:33:26Z fix for failures commit 7d50d84f90fcda9e5dec79c9be834870c83443c4 Author: Sandeep Singh Date: 2016-10-12T06:34:23Z Merge branch 'master' into SPARK-17822 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15433: [SPARK-17822] Use weak reference in JVMObjectTrac...
Github user techaddict closed the pull request at: https://github.com/apache/spark/pull/15433 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15433: [SPARK-17822] Use weak reference in JVMObjectTrac...
GitHub user techaddict opened a pull request: https://github.com/apache/spark/pull/15433 [SPARK-17822] Use weak reference in JVMObjectTracker.objMap because it may leak JVM objects ## What changes were proposed in this pull request? Use weak reference in JVMObjectTracker.objMap because it may leak JVM objects ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/techaddict/spark SPARK-17822 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15433.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15433 commit 7023c40a99eaa81ee7bcd202a4b74df811d0cfc7 Author: Sandeep Singh Date: 2016-10-10T11:54:34Z [SPARK-17822] Use weak reference in JVMObjectTracker.objMap because it may leak JVM objects commit 69845947df62187eb40f3cc6468b52e38bdab897 Author: Sandeep Singh Date: 2016-10-10T13:23:56Z Merge branch 'master' into SPARK-17822 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13334: [SPARK-15576] Add back hive tests blacklisted by ...
Github user techaddict closed the pull request at: https://github.com/apache/spark/pull/13334 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13767: [MINOR][SQL] Not dropping all necessary tables
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/13767 @srowen yes, the issue is still there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13767: [MINOR][SQL] Not dropping all necessary tables
GitHub user techaddict reopened a pull request: https://github.com/apache/spark/pull/13767 [MINOR][SQL] Not dropping all necessary tables ## What changes were proposed in this pull request? was not dropping table `parquet_t3` ## How was this patch tested? tested `LogicalPlanToSQLSuite` locally You can merge this pull request into a Git repository by running: $ git pull https://github.com/techaddict/spark minor-8 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13767.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13767 commit a2bab62abf9de24b4f09f1c3a31bcc468f1af8a4 Author: Sandeep Singh Date: 2016-06-19T06:11:28Z [MINOR][SQL] Not dropping all necessary tables Not dropping table `parquet_t3` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13767: [MINOR][SQL] Not dropping all necessary tables
Github user techaddict closed the pull request at: https://github.com/apache/spark/pull/13767 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14924: [SPARK-17299] TRIM/LTRIM/RTRIM should not strips charact...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/14924 @srowen yes in stringExpressions the trim is on with UTF8String. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14924: [SPARK-17299] TRIM/LTRIM/RTRIM should not strips charact...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/14924 @rxin Done ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14924: [SPARK-17299] TRIM/LTRIM/RTRIM should not strips ...
GitHub user techaddict opened a pull request: https://github.com/apache/spark/pull/14924 [SPARK-17299] TRIM/LTRIM/RTRIM should not strips characters other than spaces ## What changes were proposed in this pull request? TRIM/LTRIM/RTRIM should not strips characters other than spaces, we were trimming all chars small than ASCII 0x20(space) ## How was this patch tested? fixed existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/techaddict/spark SPARK-17299 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14924.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14924 commit 58c2a5dae4fc372ed5b7f2ff8e47ab4d6bb9e76e Author: Sandeep Singh Date: 2016-09-01T19:29:17Z [SPARK-17299] TRIM/LTRIM/RTRIM should not strips characters other than spaces --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12913: [SPARK-928][CORE] Add support for Unsafe-based serialize...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/12913 @holdenk Updated the PR, ready for review again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12913: [SPARK-928][CORE] Add support for Unsafe-based se...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/12913#discussion_r73454617 --- Diff: core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala --- @@ -399,6 +399,14 @@ class KryoSerializerSuite extends SparkFunSuite with SharedSparkContext { assert(!ser2.getAutoReset) } + private def testBothUnsafeAndSafe(f: SparkConf => Unit): Unit = { --- End diff -- Yes will update the pr today. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11105: [SPARK-12469][CORE] Data Property accumulators fo...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/11105#discussion_r73330940 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -220,8 +220,27 @@ class TaskMetrics private[spark] () extends Serializable { */ @transient private[spark] lazy val externalAccums = new ArrayBuffer[AccumulatorV2[_, _]] + /** +* All data property accumulators registered with this task. +*/ + @transient private lazy val dataPropertyAccums = new ArrayBuffer[AccumulatorV2[_, _]] + private[spark] def registerAccumulator(a: AccumulatorV2[_, _]): Unit = { externalAccums += a +if (a.dataProperty) { + dataPropertyAccums += a +} + } + + private[spark] def hasDataPropertyAccumulators(): Boolean = { +!dataPropertyAccums.isEmpty --- End diff -- nit: could be nonEmpty --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11105: [SPARK-12469][CORE] Data Property accumulators fo...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/11105#discussion_r73330741 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -220,8 +220,27 @@ class TaskMetrics private[spark] () extends Serializable { */ @transient private[spark] lazy val externalAccums = new ArrayBuffer[AccumulatorV2[_, _]] + /** +* All data property accumulators registered with this task. +*/ + @transient private lazy val dataPropertyAccums = new ArrayBuffer[AccumulatorV2[_, _]] + private[spark] def registerAccumulator(a: AccumulatorV2[_, _]): Unit = { externalAccums += a +if (a.dataProperty) { + dataPropertyAccums += a +} + } + + private[spark] def hasDataPropertyAccumulators(): Boolean = { +!dataPropertyAccums.isEmpty + } + + /** + * Mark an rdd/shuffle/and partition as fully processed for all dataProperty accumulators. + */ + private[spark] def markFullyProcessed(rddId: Int, shuffleWriteId: Int, partitionId: Int) = { +dataPropertyAccums.map(_.markFullyProcessed(rddId, shuffleWriteId, partitionId)) --- End diff -- should be `foreach` instead of `map` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13334: [SPARK-15576] Add back hive tests blacklisted by SPARK-1...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/13334 @andrewor14 ping. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14315: [HOTFIX][BUILD][SPARK-16287][SQL] Fix annotation argumen...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/14315 @jaceklaskowski thanks for finding this out. Its weird it passed locally too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/14036 @yhuai sure, doing performance testing using sql query or expression ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13990: [SPARK-16287][SQL] Implement str_to_map SQL function
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/13990 @cloud-fan Comment addressed, test passed ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13990: [SPARK-16287][SQL] Implement str_to_map SQL function
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/13990 @cloud-fan np, ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13990: [SPARK-16287][SQL] Implement str_to_map SQL function
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/13990 @cloud-fan anything else, it good to merge ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/14036#discussion_r70639783 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala --- @@ -234,6 +234,7 @@ object FunctionRegistry { expression[Subtract]("-"), expression[Multiply]("*"), expression[Divide]("/"), +expression[IntegerDivide]("div"), --- End diff -- @lianhuiwang doing ```div(4,2)``` gives ``` hive> div(4, 2); NoViableAltException(14@[]) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1099) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:204) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:440) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1249) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1295) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1178) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1166) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:236) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:782) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:721) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:648) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) FAILED: ParseException line 1:0 cannot recognize input near 'div' '(' '4' ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/14036 @cloud-fan Done ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/14036 @cloud-fan Updated the PR, all tests should pass now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/14036#discussion_r70563932 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala --- @@ -249,11 +241,12 @@ case class Divide(left: Expression, right: Expression) s"${eval2.value} == 0" } val javaType = ctx.javaType(dataType) -val divide = if (dataType.isInstanceOf[DecimalType]) { +val divide = if (dataType.isInstanceOf[DecimalType] || dataType.isInstanceOf[DoubleType]) { s"${eval1.value}.$decimalMethod(${eval2.value})" } else { - s"($javaType)(${eval1.value} $symbol ${eval2.value})" + s"($javaType)(${eval1.value} $decimalMethod ${eval2.value})" --- End diff -- but what about the decimalMethod used in line 245 ? if we inline the operator `/` there too it gives ```Binary numeric promotion not possible on types "org.apache.spark.sql.types.Decimal" and "org.apache.spark.sql.types.Decimal"``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/13990#discussion_r70562875 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala --- @@ -384,4 +384,39 @@ class StringFunctionsSuite extends QueryTest with SharedSQLContext { }.getMessage assert(m.contains("Invalid number of arguments for function sentences")) } + + test("str_to_map function") { +val df1 = Seq( + ("a=1,b=2", "y"), + ("a=1,b=2,c=3", "y") +).toDF("a", "b") + +checkAnswer( + df1.selectExpr("str_to_map(a,',','=')"), + Seq( +Row(Map("a" -> "1", "b" -> "2")), +Row(Map("a" -> "1", "b" -> "2", "c" -> "3")) + ) +) + +val df2 = Seq(("a:1,b:2,c:3", "y")).toDF("a", "b") + +checkAnswer( + df2.selectExpr("str_to_map(a)"), + Seq(Row(Map("a" -> "1", "b" -> "2", "c" -> "3"))) +) + +// All arguments should be string literals. +val m1 = intercept[AnalysisException]{ + sql("select str_to_map('a:1,b:2,c:3',null,null)").collect() --- End diff -- It gives ```FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': All argument should be string/character type``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13990: [SPARK-16287][SQL] Implement str_to_map SQL function
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/13990 @cloud-fan all comments addressed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/13990#discussion_r70559727 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -393,3 +394,54 @@ case class CreateNamedStructUnsafe(children: Seq[Expression]) extends Expression override def prettyName: String = "named_struct_unsafe" } + +/** + * Creates a map after splitting the input text into key/value pairs using delimeters + */ +@ExpressionDescription( + usage = "_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map after splitting the text " + +"into key/value pairs using delimiters. " + +"Default delimiters are ',' for pairDelim and ':' for keyValueDelim.", + extended = """ > SELECT _FUNC_('a:1,b:2,c:3',',',':');\n map("a":"1","b":"2","c":"3") """) +case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: Expression) + extends TernaryExpression with CodegenFallback{ + + def this(child: Expression, pairDelim: Expression) = { +this(child, pairDelim, Literal(":")) + } + + def this(child: Expression) = { +this(child, Literal(","), Literal(":")) + } + + override def children: Seq[Expression] = Seq(text, pairDelim, keyValueDelim) + + override def dataType: DataType = MapType(StringType, StringType, valueContainsNull = false) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.map(_.dataType).forall(_ == StringType)) { + TypeCheckResult.TypeCheckSuccess +} else if (Seq(pairDelim, keyValueDelim).forall(_.foldable)) { + TypeCheckResult.TypeCheckFailure(s"String To Map's all arguments must be of type string.") +} else { + TypeCheckResult.TypeCheckFailure( --- End diff -- First all children should have dataType StringType, if they don't and delims are foldable fail using args must be string. else they are not foldable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/14036#discussion_r70471772 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala --- @@ -237,6 +229,9 @@ case class Divide(left: Expression, right: Expression) } } + // Used by doGenCode + protected def divide(eval1: ExprCode, eval2: ExprCode, javaType: String): String --- End diff -- @cloud-fan yes but getting ```A method named "$div" is not declared in any enclosing class nor any supertype, nor through a static import``` in the updated pr for `Code generation of (2.0 / 1.0)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/14036#discussion_r70437720 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala --- @@ -237,6 +229,9 @@ case class Divide(left: Expression, right: Expression) } } + // Used by doGenCode + protected def divide(eval1: ExprCode, eval2: ExprCode, javaType: String): String --- End diff -- I did it on purpose. we can't call `$div` on `byte's` and plus if I try to call `value = value1 / value2;` for decimals, I get ```Binary numeric promotion not possible on types "org.apache.spark.sql.types.Decimal" and "org.apache.spark.sql.types.Decimal"```. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/13990#discussion_r70434309 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -393,3 +394,84 @@ case class CreateNamedStructUnsafe(children: Seq[Expression]) extends Expression override def prettyName: String = "named_struct_unsafe" } + +/** + * Creates a map after splitting the input text into key/value pairs using delimeters + */ +@ExpressionDescription( + usage = "_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map after splitting the text " + +"into key/value pairs using delimiters. " + +"Default delimiters are ',' for pairDelim and ':' for keyValueDelim.", + extended = """ > SELECT _FUNC_('a:1,b:2,c:3',',',':');\n map("a":"1","b":"2","c":"3") """) +case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: Expression) + extends TernaryExpression { + + def this(child: Expression, pairDelim: Expression) = { +this(child, pairDelim, Literal(":")) + } + + def this(child: Expression) = { +this(child, Literal(","), Literal(":")) + } + + override def children: Seq[Expression] = Seq(text, pairDelim, keyValueDelim) + + override def dataType: DataType = MapType(StringType, StringType, valueContainsNull = false) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.map(_.dataType).forall(_ == StringType)) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure(s"String To Map's all arguments should be string literal.") --- End diff -- only text should be foldable ? or all three ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13990: [SPARK-16287][SQL] Implement str_to_map SQL function
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/13990 @rxin no need, I will update this today. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/14036 @rxin @cloud-fan done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/13990#discussion_r69772459 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -441,10 +452,15 @@ case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: E UTF8String[] $keyArray = new UTF8String[$tempArray.length]; UTF8String[] $valueArray = new UTF8String[$tempArray.length]; -for (int $i = 0; $i < $tempArray.length; $i ++) { +for (int $i = 0; $i < $tempArray.length; $i++) { UTF8String[] $keyValue = ($tempArray[$i]).split($keyValueDelim, 2); $keyArray[$i] = $keyValue[0]; - $valueArray[$i] = $keyValue[1]; + if ($keyValue.length < 2) { +$valueArray[$i] = null; + } + else { +$valueArray[$i] = $keyValue[1]; + } --- End diff -- I think that syntax is allowed in java, and it's much more readable than tertiary if notation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/13990#discussion_r69676815 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -393,3 +393,71 @@ case class CreateNamedStructUnsafe(children: Seq[Expression]) extends Expression override def prettyName: String = "named_struct_unsafe" } + +/** + * Creates a map after splitting the input text into key/value pairs using delimeters + */ +@ExpressionDescription( + usage = """_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map after splitting the text into --- End diff -- not sure about the display ```[Usage: str_to_map(text[, pairDelim, keyValueDelim]) - Creates a map after splitting the text into key/value pairs using delimiters. Default delimiters are ',' for pairDelim and '=' for keyValueDelim.]``` added example --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/13990#discussion_r69675997 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -393,3 +393,71 @@ case class CreateNamedStructUnsafe(children: Seq[Expression]) extends Expression override def prettyName: String = "named_struct_unsafe" } + +/** + * Creates a map after splitting the input text into key/value pairs using delimeters + */ +@ExpressionDescription( + usage = """_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map after splitting the text into +key/value pairs using delimiters. +Default delimiters are ',' for pairDelim and '=' for keyValueDelim.""") +case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: Expression) + extends TernaryExpression with ExpectsInputTypes { + + def this(child: Expression) = { +this(child, Literal(","), Literal("=")) + } + + override def children: Seq[Expression] = Seq(text, pairDelim, keyValueDelim) + + override def inputTypes: Seq[AbstractDataType] = Seq(StringType, StringType, StringType) + + override def dataType: DataType = MapType(StringType, StringType, valueContainsNull = false) + + override def nullSafeEval(str: Any, delim1: Any, delim2: Any): Any = { +val array = str.asInstanceOf[UTF8String] + .split(delim1.asInstanceOf[UTF8String], -1) + .map{_.split(delim2.asInstanceOf[UTF8String], 2)} + +ArrayBasedMapData(array.map(_(0)), array.map(_(1))).asInstanceOf[MapData] + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { + +nullSafeCodeGen(ctx, ev, (text, delim1, delim2) => { + val arrayClass = classOf[GenericArrayData].getName + val mapClass = classOf[ArrayBasedMapData].getName + val keyArray = ctx.freshName("keyArray") + val valueArray = ctx.freshName("valueArray") + ctx.addMutableState("UTF8String[]", keyArray, s"this.$keyArray = null;") + ctx.addMutableState("UTF8String[]", valueArray, s"this.$valueArray = null;") --- End diff -- ohh yes, makes sense. Made the change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/13990#discussion_r69675457 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -393,3 +393,71 @@ case class CreateNamedStructUnsafe(children: Seq[Expression]) extends Expression override def prettyName: String = "named_struct_unsafe" } + +/** + * Creates a map after splitting the input text into key/value pairs using delimeters + */ +@ExpressionDescription( + usage = """_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map after splitting the text into +key/value pairs using delimiters. +Default delimiters are ',' for pairDelim and '=' for keyValueDelim.""") +case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: Expression) + extends TernaryExpression with ExpectsInputTypes { + + def this(child: Expression) = { +this(child, Literal(","), Literal("=")) + } + + override def children: Seq[Expression] = Seq(text, pairDelim, keyValueDelim) + + override def inputTypes: Seq[AbstractDataType] = Seq(StringType, StringType, StringType) + + override def dataType: DataType = MapType(StringType, StringType, valueContainsNull = false) + + override def nullSafeEval(str: Any, delim1: Any, delim2: Any): Any = { +val array = str.asInstanceOf[UTF8String] + .split(delim1.asInstanceOf[UTF8String], -1) + .map{_.split(delim2.asInstanceOf[UTF8String], 2)} + +ArrayBasedMapData(array.map(_(0)), array.map(_(1))).asInstanceOf[MapData] + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { + +nullSafeCodeGen(ctx, ev, (text, delim1, delim2) => { + val arrayClass = classOf[GenericArrayData].getName + val mapClass = classOf[ArrayBasedMapData].getName + val keyArray = ctx.freshName("keyArray") + val valueArray = ctx.freshName("valueArray") + ctx.addMutableState("UTF8String[]", keyArray, s"this.$keyArray = null;") + ctx.addMutableState("UTF8String[]", valueArray, s"this.$valueArray = null;") --- End diff -- And we are doing similar stuff in `CreateMap` and doing same https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L129 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/13990#discussion_r69675325 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -393,3 +393,71 @@ case class CreateNamedStructUnsafe(children: Seq[Expression]) extends Expression override def prettyName: String = "named_struct_unsafe" } + +/** + * Creates a map after splitting the input text into key/value pairs using delimeters + */ +@ExpressionDescription( + usage = """_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map after splitting the text into +key/value pairs using delimiters. +Default delimiters are ',' for pairDelim and '=' for keyValueDelim.""") +case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: Expression) + extends TernaryExpression with ExpectsInputTypes { + + def this(child: Expression) = { +this(child, Literal(","), Literal("=")) + } + + override def children: Seq[Expression] = Seq(text, pairDelim, keyValueDelim) + + override def inputTypes: Seq[AbstractDataType] = Seq(StringType, StringType, StringType) + + override def dataType: DataType = MapType(StringType, StringType, valueContainsNull = false) + + override def nullSafeEval(str: Any, delim1: Any, delim2: Any): Any = { +val array = str.asInstanceOf[UTF8String] + .split(delim1.asInstanceOf[UTF8String], -1) + .map{_.split(delim2.asInstanceOf[UTF8String], 2)} + +ArrayBasedMapData(array.map(_(0)), array.map(_(1))).asInstanceOf[MapData] + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { + +nullSafeCodeGen(ctx, ev, (text, delim1, delim2) => { + val arrayClass = classOf[GenericArrayData].getName + val mapClass = classOf[ArrayBasedMapData].getName + val keyArray = ctx.freshName("keyArray") + val valueArray = ctx.freshName("valueArray") + ctx.addMutableState("UTF8String[]", keyArray, s"this.$keyArray = null;") + ctx.addMutableState("UTF8String[]", valueArray, s"this.$valueArray = null;") --- End diff -- I get ```Caused by: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 56, Column 16: "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection" has no field "keyArray"``` for ```java this.keyArray = null; ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/13990#discussion_r69670913 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -393,3 +393,71 @@ case class CreateNamedStructUnsafe(children: Seq[Expression]) extends Expression override def prettyName: String = "named_struct_unsafe" } + +/** + * Creates a map after splitting the input text into key/value pairs using delimeters + */ +@ExpressionDescription( + usage = """_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map after splitting the text into +key/value pairs using delimiters. +Default delimiters are ',' for pairDelim and '=' for keyValueDelim.""") +case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: Expression) + extends TernaryExpression with ExpectsInputTypes { + + def this(child: Expression) = { +this(child, Literal(","), Literal("=")) + } + + override def children: Seq[Expression] = Seq(text, pairDelim, keyValueDelim) + + override def inputTypes: Seq[AbstractDataType] = Seq(StringType, StringType, StringType) + + override def dataType: DataType = MapType(StringType, StringType, valueContainsNull = false) + + override def nullSafeEval(str: Any, delim1: Any, delim2: Any): Any = { +val array = str.asInstanceOf[UTF8String] + .split(delim1.asInstanceOf[UTF8String], -1) + .map{_.split(delim2.asInstanceOf[UTF8String], 2)} + +ArrayBasedMapData(array.map(_(0)), array.map(_(1))).asInstanceOf[MapData] + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { + +nullSafeCodeGen(ctx, ev, (text, delim1, delim2) => { + val arrayClass = classOf[GenericArrayData].getName + val mapClass = classOf[ArrayBasedMapData].getName + val keyArray = ctx.freshName("keyArray") + val valueArray = ctx.freshName("valueArray") + ctx.addMutableState("UTF8String[]", keyArray, s"this.$keyArray = null;") + ctx.addMutableState("UTF8String[]", valueArray, s"this.$valueArray = null;") --- End diff -- It won't let me assign value to these vars(https://github.com/apache/spark/pull/13990/files#diff-c1758d627a06084e577be0d33d47f44eR457) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/13990#discussion_r69605110 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -393,3 +393,73 @@ case class CreateNamedStructUnsafe(children: Seq[Expression]) extends Expression override def prettyName: String = "named_struct_unsafe" } + +/** + * Creates a map after splitting the input text into key/value pairs using delimeters + */ +@ExpressionDescription( + usage = """_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map after splitting the text into +key/value pairs using delimiters. +Default delimiters are ',' for pairDelim and '=' for keyValueDelim.""") +case class StringToMap(child: Expression, pairDelim: Expression, keyValueDelim: Expression) + extends TernaryExpression with ExpectsInputTypes { + + def this(child: Expression) = { +this(child, Literal(","), Literal("=")) + } + + override def children: Seq[Expression] = Seq(child, pairDelim, keyValueDelim) + + override def inputTypes: Seq[AbstractDataType] = Seq(StringType, StringType, StringType) --- End diff -- not sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/14036 @cloud-fan addressed all your comments ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13334: [SPARK-15576] Add back hive tests blacklisted by SPARK-1...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/13334 @andrewor14 I've made the changes, can you take a look now ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13334: [SPARK-15576] Add back hive tests blacklisted by ...
GitHub user techaddict reopened a pull request: https://github.com/apache/spark/pull/13334 [SPARK-15576] Add back hive tests blacklisted by SPARK-15539 ## What changes were proposed in this pull request? Add back hive tests blacklisted by SPARK-15539 ## How was this patch tested? ran HiveCompatibilitySuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/techaddict/spark SPARK-15576 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13334.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13334 commit 6361ac5653733f89d9697101a4cac52f17901c61 Author: Sandeep Singh Date: 2016-05-26T19:34:48Z [SPARK-15576] Add back hive tests blacklisted by SPARK-15539 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/14036#discussion_r69407742 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala --- @@ -285,6 +284,75 @@ case class Divide(left: Expression, right: Expression) } @ExpressionDescription( + usage = "a _FUNC_ b - Divides a by b.", + extended = "> SELECT 3 _FUNC_ 2;\n 1") +case class IntegerDivide(left: Expression, right: Expression) --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/14036#discussion_r69392406 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala --- @@ -285,6 +284,75 @@ case class Divide(left: Expression, right: Expression) } @ExpressionDescription( + usage = "a _FUNC_ b - Divides a by b.", + extended = "> SELECT 3 _FUNC_ 2;\n 1") +case class IntegerDivide(left: Expression, right: Expression) --- End diff -- Let me try doing that ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/14036#discussion_r69392402 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala --- @@ -234,6 +234,7 @@ object FunctionRegistry { expression[Subtract]("-"), expression[Multiply]("*"), expression[Divide]("/"), +expression[IntegerDivide]("div"), --- End diff -- I don't think so. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/13990#discussion_r69392299 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -393,3 +393,73 @@ case class CreateNamedStructUnsafe(children: Seq[Expression]) extends Expression override def prettyName: String = "named_struct_unsafe" } + +/** + * Creates a map after splitting the input text into key/value pairs using delimeters + */ +@ExpressionDescription( + usage = """_FUNC_(text[, delimiter1, delimiter2]) - Creates a map after splitting the text into --- End diff -- yupp sound much better, let me make the change --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/13990#discussion_r69392113 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -393,3 +393,73 @@ case class CreateNamedStructUnsafe(children: Seq[Expression]) extends Expression override def prettyName: String = "named_struct_unsafe" } + +/** + * Creates a map after splitting the input text into key/value pairs using delimeters + */ +@ExpressionDescription( + usage = """_FUNC_(text[, delimiter1, delimiter2]) - Creates a map after splitting the text into --- End diff -- Used `delimiter1` and `delimiter2` because its named that way in hive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/13990#discussion_r69392080 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -393,3 +393,73 @@ case class CreateNamedStructUnsafe(children: Seq[Expression]) extends Expression override def prettyName: String = "named_struct_unsafe" } + +/** + * Creates a map after splitting the input text into key/value pairs using delimeters + */ +@ExpressionDescription( + usage = """_FUNC_(text[, delimiter1, delimiter2]) - Creates a map after splitting the text into --- End diff -- how about `pairDelim` and `pairSeperatorDelim`, not very good with naming what do you suggest ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13990: [SPARK-16287][SQL] Implement str_to_map SQL function
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/13990 cc: @cloud-fan @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/14036 cc: @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...
GitHub user techaddict opened a pull request: https://github.com/apache/spark/pull/14036 [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessary cast ## What changes were proposed in this pull request? Add IntegerDivide to avoid unnecessary cast Before: ``` scala> spark.sql("select 6 div 3").explain(true) ... == Analyzed Logical Plan == CAST((6 / 3) AS BIGINT): bigint Project [cast((cast(6 as double) / cast(3 as double)) as bigint) AS CAST((6 / 3) AS BIGINT)#5L] +- OneRowRelation$ ... ``` After: ``` scala> spark.sql("select 6 div 3").explain(true) ... == Analyzed Logical Plan == (6 / 3): int Project [(6 / 3) AS (6 / 3)#11] +- OneRowRelation$ ... ``` ## How was this patch tested? Existing Tests and added new ones You can merge this pull request into a Git repository by running: $ git pull https://github.com/techaddict/spark SPARK-16323 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14036.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14036 commit 067b788b05846e659615feef8613f8965573f150 Author: Sandeep Singh Date: 2016-07-03T09:00:11Z [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessary cast Before: ``` scala> spark.sql("select 6 div 3").explain(true) ... == Analyzed Logical Plan == CAST((6 / 3) AS BIGINT): bigint Project [cast((cast(6 as double) / cast(3 as double)) as bigint) AS CAST((6 / 3) AS BIGINT)#5L] +- OneRowRelation$ ... ``` After: ``` scala> spark.sql("select 6 div 3").explain(true) ... == Analyzed Logical Plan == (6 / 3): int Project [(6 / 3) AS (6 / 3)#11] +- OneRowRelation$ ... ``` commit e4e42c35b0236dff6aedf6468a7d94f80bc6023b Author: Sandeep Singh Date: 2016-07-03T09:00:59Z Merge branch 'master' into SPARK-16323 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14032: [Minor][SQL] Replace Parquet deprecations
Github user techaddict closed the pull request at: https://github.com/apache/spark/pull/14032 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14032: [Minor][SQL] Replace Parquet deprecations
GitHub user techaddict opened a pull request: https://github.com/apache/spark/pull/14032 [Minor][SQL] Replace Parquet deprecations ## What changes were proposed in this pull request? 1. Replace `Binary.fromByteArray` with `Binary.fromReusedByteArray` 2. Replace `ConversionPatterns.listType ` with`ConversionPatterns.listOfElements` ## How was this patch tested? Existing Tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/techaddict/spark depre-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14032.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14032 commit 20aa871a02d08d45f716a9974abe479f077ccd30 Author: Sandeep Singh Date: 2016-07-03T04:45:54Z [Minor][SQL] Replace Parquet deprecations 1. Replace `Binary.fromByteArray` with `Binary.fromReusedByteArray` 2. Replace `ConversionPatterns.listType ` with `ConversionPatterns.listOfElements` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13767: [MINOR][SQL] Not dropping all necessary tables
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/13767 cc: @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13990: [SPARK-16287][SQL][WIP] Implement str_to_map SQL ...
GitHub user techaddict opened a pull request: https://github.com/apache/spark/pull/13990 [SPARK-16287][SQL][WIP] Implement str_to_map SQL function ## What changes were proposed in this pull request? This PR adds `str_to_map` SQL function in order to remove Hive fallback. ## How was this patch tested? Pass the Jenkins tests with newly added. You can merge this pull request into a Git repository by running: $ git pull https://github.com/techaddict/spark SPARK-16287 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13990.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13990 commit af59f57cecd93de49ec5bd20058199d93a9f2445 Author: Sandeep Singh Date: 2016-06-30T03:54:05Z First pass without arguments commit dc6b1f439e32768828bdb7d1a10f8b8178fa4c13 Author: Sandeep Singh Date: 2016-06-30T04:32:54Z Add delimiter options commit a8e6631edf6d124f218b15589427664f5b454759 Author: Sandeep Singh Date: 2016-06-30T04:36:08Z Merge master commit 1f888abb532c905dac11b404819786fd2641e38f Author: Sandeep Singh Date: 2016-06-30T04:37:13Z merge fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13767: [MINOR][SQL] Not dropping all necessary tables
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/13767 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13767: [MINOR][SQL] Not dropping all necessary tables
GitHub user techaddict opened a pull request: https://github.com/apache/spark/pull/13767 [MINOR][SQL] Not dropping all necessary tables ## What changes were proposed in this pull request? was not dropping table `parquet_t3` ## How was this patch tested? tested `LogicalPlanToSQLSuite` locally You can merge this pull request into a Git repository by running: $ git pull https://github.com/techaddict/spark minor-8 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13767.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13767 commit a2bab62abf9de24b4f09f1c3a31bcc468f1af8a4 Author: Sandeep Singh Date: 2016-06-19T06:11:28Z [MINOR][SQL] Not dropping all necessary tables Not dropping table `parquet_t3` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org