[GitHub] spark pull request #20355: SPARK-23148: [SQL] Allow pathnames with special c...
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/20355#discussion_r163422934 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala --- @@ -68,13 +68,16 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext { } allFileBasedDataSources.foreach { format => -test(s"SPARK-22146 read files containing special characters using $format") { - val nameWithSpecialChars = s"sp%chars" - withTempDir { dir => -val tmpFile = s"$dir/$nameWithSpecialChars" -spark.createDataset(Seq("a", "b")).write.format(format).save(tmpFile) -val fileContent = spark.read.format(format).load(tmpFile) -checkAnswer(fileContent, Seq(Row("a"), Row("b"))) +test(s"SPARK-22146 / SPARK-23148 read files containing special characters using $format") { + val nameWithSpecialChars = s"sp%c hars" + Seq(true, false).foreach { multiline => --- End diff -- Sounds good to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20365: [SPARK-23192] [SQL] Keep the Hint after Using Cached Dat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20365 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86546/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20365: [SPARK-23192] [SQL] Keep the Hint after Using Cached Dat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20365 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20365: [SPARK-23192] [SQL] Keep the Hint after Using Cached Dat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20365 **[Test build #86546 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86546/testReport)** for PR 20365 at commit [`7209792`](https://github.com/apache/spark/commit/72097921f33492160a2784e108d2eb61fa543672). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20368: [SPARK-23195] [SQL] Keep the Hint of Cached Data
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20368 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20368: [SPARK-23195] [SQL] Keep the Hint of Cached Data
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20368 Thanks! Merged to master/2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20373: [WIP][SPARK-23159][PYTHON] Update cloudpickle to ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/20373#discussion_r163421661 --- Diff: python/pyspark/cloudpickle.py --- @@ -1087,13 +1038,6 @@ def _find_module(mod_name): file.close() return path, description -def _load_namedtuple(name, fields): -""" -Loads a class generated by namedtuple -""" -from collections import namedtuple -return namedtuple(name, fields) - --- End diff -- This didn't seem necessary anymore after the fix for namedtuples --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20373: [WIP][SPARK-23159][PYTHON] Update cloudpickle to ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/20373#discussion_r163421434 --- Diff: python/pyspark/cloudpickle.py --- @@ -1019,18 +948,40 @@ def __reduce__(cls): return cls.__name__ -def _fill_function(func, globals, defaults, dict, module, closure_values): -""" Fills in the rest of function data into the skeleton function object -that were created via _make_skel_func(). +def _fill_function(*args): +"""Fills in the rest of function data into the skeleton function object + +The skeleton itself is create by _make_skel_func(). --- End diff -- Restore compatibility with functions pickled with 0.4.0 (#128) https://github.com/cloudpipe/cloudpickle/commit/7d8c670b703a683d6fd7e642c6bec8a487594d20 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20373: [WIP][SPARK-23159][PYTHON] Update cloudpickle to ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/20373#discussion_r163421517 --- Diff: python/pyspark/cloudpickle.py --- @@ -1019,18 +948,40 @@ def __reduce__(cls): return cls.__name__ -def _fill_function(func, globals, defaults, dict, module, closure_values): -""" Fills in the rest of function data into the skeleton function object -that were created via _make_skel_func(). +def _fill_function(*args): +"""Fills in the rest of function data into the skeleton function object + +The skeleton itself is create by _make_skel_func(). """ -func.__globals__.update(globals) -func.__defaults__ = defaults -func.__dict__ = dict -func.__module__ = module +if len(args) == 2: +func = args[0] +state = args[1] +elif len(args) == 5: +# Backwards compat for cloudpickle v0.4.0, after which the `module` +# argument was introduced +func = args[0] +keys = ['globals', 'defaults', 'dict', 'closure_values'] +state = dict(zip(keys, args[1:])) +elif len(args) == 6: +# Backwards compat for cloudpickle v0.4.1, after which the function +# state was passed as a dict to the _fill_function it-self. +func = args[0] +keys = ['globals', 'defaults', 'dict', 'module', 'closure_values'] +state = dict(zip(keys, args[1:])) +else: +raise ValueError('Unexpected _fill_value arguments: %r' % (args,)) + +func.__globals__.update(state['globals']) +func.__defaults__ = state['defaults'] +func.__dict__ = state['dict'] +if 'module' in state: +func.__module__ = state['module'] +if 'qualname' in state: +func.__qualname__ = state['qualname'] --- End diff -- Preserve func.__qualname__ when defined https://github.com/cloudpipe/cloudpickle/commit/14b38a3ab5970d96cce1492c790494932285f845 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20373: [WIP][SPARK-23159][PYTHON] Update cloudpickle to ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/20373#discussion_r163421285 --- Diff: python/pyspark/cloudpickle.py --- @@ -913,11 +841,12 @@ def dump(obj, file, protocol=2): def dumps(obj, protocol=2): file = StringIO() - -cp = CloudPickler(file,protocol) -cp.dump(obj) - -return file.getvalue() +try: +cp = CloudPickler(file,protocol) +cp.dump(obj) +return file.getvalue() +finally: +file.close() --- End diff -- Close StringIO timely on exception https://github.com/cloudpipe/cloudpickle/commit/ca4661b3a20b635f4c240ef763f5759267d74cb9 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20373: [WIP][SPARK-23159][PYTHON] Update cloudpickle to ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/20373#discussion_r163421163 --- Diff: python/pyspark/cloudpickle.py --- @@ -867,23 +797,21 @@ def save_not_implemented(self, obj): dispatch[type(Ellipsis)] = save_ellipsis dispatch[type(NotImplemented)] = save_not_implemented -# WeakSet was added in 2.7. -if hasattr(weakref, 'WeakSet'): -def save_weakset(self, obj): -self.save_reduce(weakref.WeakSet, (list(obj),)) - -dispatch[weakref.WeakSet] = save_weakset +def save_weakset(self, obj): +self.save_reduce(weakref.WeakSet, (list(obj),)) -"""Special functions for Add-on libraries""" -def inject_addons(self): -"""Plug in system. Register additional pickling functions if modules already loaded""" -pass +dispatch[weakref.WeakSet] = save_weakset def save_logger(self, obj): self.save_reduce(logging.getLogger, (obj.name,), obj=obj) dispatch[logging.Logger] = save_logger +"""Special functions for Add-on libraries""" +def inject_addons(self): +"""Plug in system. Register additional pickling functions if modules already loaded""" +pass + --- End diff -- Further cleanups https://github.com/cloudpipe/cloudpickle/commit/c91aaf110441991307f5097f950764079d0f9652 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20373: [WIP][SPARK-23159][PYTHON] Update cloudpickle to ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/20373#discussion_r163421005 --- Diff: python/pyspark/cloudpickle.py --- @@ -754,64 +742,6 @@ def __getattribute__(self, item): if type(operator.attrgetter) is type: dispatch[operator.attrgetter] = save_attrgetter -def save_reduce(self, func, args, state=None, -listitems=None, dictitems=None, obj=None): -# Assert that args is a tuple or None -if not isinstance(args, tuple): -raise pickle.PicklingError("args from reduce() should be a tuple") - -# Assert that func is callable -if not hasattr(func, '__call__'): -raise pickle.PicklingError("func from reduce should be callable") - -save = self.save -write = self.write - -# Protocol 2 special case: if func's name is __newobj__, use NEWOBJ -if self.proto >= 2 and getattr(func, "__name__", "") == "__newobj__": -cls = args[0] -if not hasattr(cls, "__new__"): -raise pickle.PicklingError( -"args[0] from __newobj__ args has no __new__") -if obj is not None and cls is not obj.__class__: -raise pickle.PicklingError( -"args[0] from __newobj__ args has the wrong class") -args = args[1:] -save(cls) - -save(args) -write(pickle.NEWOBJ) -else: -save(func) -save(args) -write(pickle.REDUCE) - -if obj is not None: -self.memoize(obj) - -# More new special cases (that work with older protocols as -# well): when __reduce__ returns a tuple with 4 or 5 items, -# the 4th and 5th item should be iterators that provide list -# items and dict items (as (key, value) tuples), or None. - -if listitems is not None: -self._batch_appends(listitems) - -if dictitems is not None: -self._batch_setitems(dictitems) - -if state is not None: -save(state) -write(pickle.BUILD) - -def save_partial(self, obj): -"""Partial objects do not serialize correctly in python2.x -- this fixes the bugs""" -self.save_reduce(_genpartial, (obj.func, obj.args, obj.keywords)) - -if sys.version_info < (2,7): # 2.7 supports partial pickling -dispatch[partial] = save_partial - - --- End diff -- Remove save_reduce() override: It is the exactly the same code as in Python 2's Pickler class. https://github.com/cloudpipe/cloudpickle/commit/2da4c243ceddebbc2febf116eb6e53035fed9b9a --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20371: [SPARK-23197][DStreams] Increased timeouts to resolve fl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20371 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20373: [WIP][SPARK-23159][PYTHON] Update cloudpickle to ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/20373#discussion_r163420818 --- Diff: python/pyspark/cloudpickle.py --- @@ -709,12 +702,7 @@ def save_property(self, obj): dispatch[property] = save_property def save_classmethod(self, obj): -try: -orig_func = obj.__func__ -except AttributeError: # Python 2.6 --- End diff -- support for Python 2.6 removed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20371: [SPARK-23197][DStreams] Increased timeouts to resolve fl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20371 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86550/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20368: [SPARK-23195] [SQL] Keep the Hint of Cached Data
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20368 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20368: [SPARK-23195] [SQL] Keep the Hint of Cached Data
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20368 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86545/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20371: [SPARK-23197][DStreams] Increased timeouts to resolve fl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20371 **[Test build #86550 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86550/testReport)** for PR 20371 at commit [`2446aa0`](https://github.com/apache/spark/commit/2446aa070efe43a6ab0d8adbecca335e94896a0b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20373: [WIP][SPARK-23159][PYTHON] Update cloudpickle to ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/20373#discussion_r163420703 --- Diff: python/pyspark/cloudpickle.py --- @@ -608,37 +620,18 @@ def save_global(self, obj, name=None, pack=struct.pack): The name of this method is somewhat misleading: all types get dispatched here. """ -if obj.__module__ == "__builtin__" or obj.__module__ == "builtins": -if obj in _BUILTIN_TYPE_NAMES: -return self.save_reduce(_builtin_type, (_BUILTIN_TYPE_NAMES[obj],), obj=obj) - -if name is None: -name = obj.__name__ - -modname = getattr(obj, "__module__", None) -if modname is None: -try: -# whichmodule() could fail, see -# https://bitbucket.org/gutworth/six/issues/63/importing-six-breaks-pickling -modname = pickle.whichmodule(obj, name) -except Exception: -modname = '__main__' - -if modname == '__main__': -themodule = None -else: -__import__(modname) -themodule = sys.modules[modname] -self.modules.add(themodule) +try: +return Pickler.save_global(self, obj, name=name) +except Exception: +if obj.__module__ == "__builtin__" or obj.__module__ == "builtins": +if obj in _BUILTIN_TYPE_NAMES: +return self.save_reduce(_builtin_type, (_BUILTIN_TYPE_NAMES[obj],), obj=obj) --- End diff -- Some cleanups, fix memoryview support https://github.com/cloudpipe/cloudpickle/commit/f8187e90aed7e1b96ffaae85cdf4b37108c75d3f --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20368: [SPARK-23195] [SQL] Keep the Hint of Cached Data
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20368 **[Test build #86545 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86545/testReport)** for PR 20368 at commit [`21e5321`](https://github.com/apache/spark/commit/21e5321d072c312e243407af08eeb9c1a796ab4d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20371: [SPARK-23197][DStreams] Increased timeouts to resolve fl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20371 **[Test build #4075 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4075/testReport)** for PR 20371 at commit [`2446aa0`](https://github.com/apache/spark/commit/2446aa070efe43a6ab0d8adbecca335e94896a0b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20371: [SPARK-23197][DStreams] Increased timeouts to resolve fl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20371 **[Test build #4073 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4073/testReport)** for PR 20371 at commit [`2446aa0`](https://github.com/apache/spark/commit/2446aa070efe43a6ab0d8adbecca335e94896a0b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20371: [SPARK-23197][DStreams] Increased timeouts to resolve fl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20371 **[Test build #4072 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4072/testReport)** for PR 20371 at commit [`2446aa0`](https://github.com/apache/spark/commit/2446aa070efe43a6ab0d8adbecca335e94896a0b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20371: [SPARK-23197][DStreams] Increased timeouts to resolve fl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20371 **[Test build #4074 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4074/testReport)** for PR 20371 at commit [`2446aa0`](https://github.com/apache/spark/commit/2446aa070efe43a6ab0d8adbecca335e94896a0b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20373: [WIP][SPARK-23159][PYTHON] Update cloudpickle to ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/20373#discussion_r163420474 --- Diff: python/pyspark/cloudpickle.py --- @@ -522,17 +529,22 @@ def save_function_tuple(self, func): self.memoize(func) # save the rest of the func data needed by _fill_function -save(f_globals) -save(defaults) -save(dct) -save(func.__module__) -save(closure_values) +state = { +'globals': f_globals, +'defaults': defaults, +'dict': dct, +'module': func.__module__, +'closure_values': closure_values, +} +if hasattr(func, '__qualname__'): +state['qualname'] = func.__qualname__ +save(state) --- End diff -- Preserve func.__qualname__ when defined https://github.com/cloudpipe/cloudpickle/commit/14b38a3ab5970d96cce1492c790494932285f845 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20371: [SPARK-23197][DStreams] Increased timeouts to resolve fl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20371 **[Test build #4076 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4076/testReport)** for PR 20371 at commit [`2446aa0`](https://github.com/apache/spark/commit/2446aa070efe43a6ab0d8adbecca335e94896a0b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20373: [WIP][SPARK-23159][PYTHON] Update cloudpickle to ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/20373#discussion_r163420127 --- Diff: python/pyspark/cloudpickle.py --- @@ -420,20 +440,18 @@ def save_dynamic_class(self, obj): from global modules. """ clsdict = dict(obj.__dict__) # copy dict proxy to a dict -if not isinstance(clsdict.get('__dict__', None), property): -# don't extract dict that are properties -clsdict.pop('__dict__', None) -clsdict.pop('__weakref__', None) - -# hack as __new__ is stored differently in the __dict__ -new_override = clsdict.get('__new__', None) -if new_override: -clsdict['__new__'] = obj.__new__ - -# namedtuple is a special case for Spark where we use the _load_namedtuple function -if getattr(obj, '_is_namedtuple_', False): -self.save_reduce(_load_namedtuple, (obj.__name__, obj._fields)) -return +clsdict.pop('__weakref__', None) + +# On PyPy, __doc__ is a readonly attribute, so we need to include it in +# the initial skeleton class. This is safe because we know that the +# doc can't participate in a cycle with the original class. +type_kwargs = {'__doc__': clsdict.pop('__doc__', None)} + +# If type overrides __dict__ as a property, include it in the type kwargs. +# In Python 2, we can't set this attribute after construction. +__dict__ = clsdict.pop('__dict__', None) +if isinstance(__dict__, property): +type_kwargs['__dict__'] = __dict__ --- End diff -- BUG: Fix bug pickling namedtuple https://github.com/cloudpipe/cloudpickle/commit/28070bba79cf71e5719ab8d7c1d6cbc72cd95a0c --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20373: [WIP][SPARK-23159][PYTHON] Update cloudpickle to ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/20373#discussion_r163419942 --- Diff: python/pyspark/cloudpickle.py --- @@ -318,6 +329,18 @@ def save_function(self, obj, name=None): Determines what kind of function obj is (e.g. lambda, defined at interactive prompt, etc) and handles the pickling appropriately. """ +if obj in _BUILTIN_TYPE_CONSTRUCTORS: +# We keep a special-cased cache of built-in type constructors at +# global scope, because these functions are structured very +# differently in different python versions and implementations (for +# example, they're instances of types.BuiltinFunctionType in +# CPython, but they're ordinary types.FunctionType instances in +# PyPy). +# +# If the function we've received is in that cache, we just +# serialize it as a lookup into the cache. +return self.save_reduce(_BUILTIN_TYPE_CONSTRUCTORS[obj], (), obj=obj) + --- End diff -- BUG: Hit the builtin type cache for any function https://github.com/cloudpipe/cloudpickle/commit/d84980ccaafc7982a50d4e04064011f401f17d1b --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20373: [WIP][SPARK-23159][PYTHON] Update cloudpickle to ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/20373#discussion_r163419648 --- Diff: python/pyspark/cloudpickle.py --- @@ -237,28 +262,14 @@ def dump(self, obj): if 'recursion' in e.args[0]: msg = """Could not pickle object as excessively deep recursion required.""" raise pickle.PicklingError(msg) -except pickle.PickleError: -raise -except Exception as e: -emsg = _exception_message(e) -if "'i' format requires" in emsg: -msg = "Object too large to serialize: %s" % emsg -else: -msg = "Could not serialize object: %s: %s" % (e.__class__.__name__, emsg) -print_exec(sys.stderr) -raise pickle.PicklingError(msg) - def save_memoryview(self, obj): -"""Fallback to save_string""" -Pickler.save_string(self, str(obj)) +self.save(obj.tobytes()) +dispatch[memoryview] = save_memoryview --- End diff -- Some cleanups, fix memoryview support https://github.com/cloudpipe/cloudpickle/commit/f8187e90aed7e1b96ffaae85cdf4b37108c75d3f --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20373: [WIP][SPARK-23159][PYTHON] Update cloudpickle to ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/20373#discussion_r163419493 --- Diff: python/pyspark/cloudpickle.py --- @@ -237,28 +262,14 @@ def dump(self, obj): if 'recursion' in e.args[0]: msg = """Could not pickle object as excessively deep recursion required.""" raise pickle.PicklingError(msg) -except pickle.PickleError: -raise -except Exception as e: -emsg = _exception_message(e) -if "'i' format requires" in emsg: -msg = "Object too large to serialize: %s" % emsg -else: -msg = "Could not serialize object: %s: %s" % (e.__class__.__name__, emsg) -print_exec(sys.stderr) -raise pickle.PicklingError(msg) - --- End diff -- This exception handling is Spark specific, it has been moved to serializers.py `CloudPickleSerializer.dumps` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20373: [WIP][SPARK-23159][PYTHON] Update cloudpickle to ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/20373#discussion_r163419329 --- Diff: python/pyspark/cloudpickle.py --- @@ -181,6 +180,32 @@ def _builtin_type(name): return getattr(types, name) +def _make__new__factory(type_): +def _factory(): +return type_.__new__ +return _factory + + +# NOTE: These need to be module globals so that they're pickleable as globals. +_get_dict_new = _make__new__factory(dict) +_get_frozenset_new = _make__new__factory(frozenset) +_get_list_new = _make__new__factory(list) +_get_set_new = _make__new__factory(set) +_get_tuple_new = _make__new__factory(tuple) +_get_object_new = _make__new__factory(object) + +# Pre-defined set of builtin_function_or_method instances that can be +# serialized. +_BUILTIN_TYPE_CONSTRUCTORS = { +dict.__new__: _get_dict_new, +frozenset.__new__: _get_frozenset_new, +set.__new__: _get_set_new, +list.__new__: _get_list_new, +tuple.__new__: _get_tuple_new, +object.__new__: _get_object_new, +} + + --- End diff -- MAINT: Handle builtin type __new__ attrs: https://github.com/cloudpipe/cloudpickle/commit/f0d2011f9fc88105c174b7c861f2c2f56e870350 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20373: [WIP][SPARK-23159][PYTHON] Update cloudpickle to match 0...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20373 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20373: [WIP][SPARK-23159][PYTHON] Update cloudpickle to match 0...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20373 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/161/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20373: [WIP][SPARK-23159][PYTHON] Update cloudpickle to match 0...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20373 **[Test build #86553 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86553/testReport)** for PR 20373 at commit [`c362df8`](https://github.com/apache/spark/commit/c362df87e2d5a5f55d2e4f7d48e24b2d7cfda6f7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20373: [WIP][SPARK-23159][PYTHON] Update cloudpickle to ...
GitHub user BryanCutler opened a pull request: https://github.com/apache/spark/pull/20373 [WIP][SPARK-23159][PYTHON] Update cloudpickle to match 0.4.2 ## What changes were proposed in this pull request? The version of cloudpickle in PySpark was close to version 0.4.0 with some additional backported fixes and some minor additions for Spark related things. With version 0.4.2 we can remove Spark related changes and make the version in Spark exactly match 0.4.2 at https://github.com/cloudpipe/cloudpickle/tree/v0.4.2 Changes by updating to 0.4.2 include: * Fix pickling of named tuples https://github.com/cloudpipe/cloudpickle/pull/113 * Built in type constructors for PyPy compatibility [here](https://github.com/cloudpipe/cloudpickle/commit/d84980ccaafc7982a50d4e04064011f401f17d1b) * Fix memoryview support https://github.com/cloudpipe/cloudpickle/pull/122 * Improved compatibility with other cloudpickle versions https://github.com/cloudpipe/cloudpickle/pull/128 * Several cleanups https://github.com/cloudpipe/cloudpickle/pull/121 and [here](https://github.com/cloudpipe/cloudpickle/commit/c91aaf110441991307f5097f950764079d0f9652) ## How was this patch tested? Existing pyspark.tests using python 2.7.14 and 3.5.2 You can merge this pull request into a Git repository by running: $ git pull https://github.com/BryanCutler/spark pyspark-update-cloudpickle-42-SPARK-23159 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20373.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20373 commit 89f13b857dba53754f6813efae2d0ca4540c48f4 Author: Bryan CutlerDate: 2018-01-23T23:25:29Z updated cloudpickle to match 0.4.2 commit c362df87e2d5a5f55d2e4f7d48e24b2d7cfda6f7 Author: Bryan Cutler Date: 2018-01-23T23:55:25Z removed unused import --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20371: [SPARK-23197][DStreams] Increased timeouts to res...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20371#discussion_r163415477 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/ReceiverSuite.scala --- @@ -105,13 +105,13 @@ class ReceiverSuite extends TestSuiteBase with TimeLimits with Serializable { assert(executor.errors.head.eq(exception)) // Verify restarting actually stops and starts the receiver -receiver.restart("restarting", null, 100) -eventually(timeout(50 millis), interval(10 millis)) { +receiver.restart("restarting", null, 600) --- End diff -- yeah. lets fix one at a time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20355: SPARK-23148: [SQL] Allow pathnames with special c...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20355#discussion_r163414704 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala --- @@ -68,13 +68,16 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext { } allFileBasedDataSources.foreach { format => -test(s"SPARK-22146 read files containing special characters using $format") { - val nameWithSpecialChars = s"sp%chars" - withTempDir { dir => -val tmpFile = s"$dir/$nameWithSpecialChars" -spark.createDataset(Seq("a", "b")).write.format(format).save(tmpFile) -val fileContent = spark.read.format(format).load(tmpFile) -checkAnswer(fileContent, Seq(Row("a"), Row("b"))) +test(s"SPARK-22146 / SPARK-23148 read files containing special characters using $format") { + val nameWithSpecialChars = s"sp%c hars" + Seq(true, false).foreach { multiline => --- End diff -- Less dup is fine but this case slightly confuses like orc and parquet support multiline, and runs duplicated tests as you pointed out if I should nitpick. I think I prefer a separate test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20371: [SPARK-23197][DStreams] Increased timeouts to resolve fl...
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/20371 LGTM, pending jenkins. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20371: [SPARK-23197][DStreams] Increased timeouts to res...
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/20371#discussion_r163412164 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/ReceiverSuite.scala --- @@ -105,13 +105,13 @@ class ReceiverSuite extends TestSuiteBase with TimeLimits with Serializable { assert(executor.errors.head.eq(exception)) // Verify restarting actually stops and starts the receiver -receiver.restart("restarting", null, 100) -eventually(timeout(50 millis), interval(10 millis)) { +receiver.restart("restarting", null, 600) --- End diff -- If these timeout bumps fix the flakiness, we should also consider enabling the "block generator throttling" test below (it was disabled via https://github.com/apache/spark/commit/b69c4f9b2e8544f1b178db2aefbcaa166f76cb7a) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20355 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20355 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/160/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20355: SPARK-23148: [SQL] Allow pathnames with special c...
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/20355#discussion_r163411267 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/text/TextSuite.scala --- @@ -172,6 +172,14 @@ class TextSuite extends QueryTest with SharedSQLContext { } } + test("SPARK-23148: test for spaces in file names") { --- End diff -- In the end, to reduce code duplication, I made it so that orc and parquet run multiline as well (I tried to find a neat way to only run multiline if the format was csv, text or json without having a separate test case but it just complicated things). Let me know if you'd rather I have two separate test cases to avoid running the two redundant cases with orc / parquet. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20360 **[Test build #86551 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86551/testReport)** for PR 20360 at commit [`74684a7`](https://github.com/apache/spark/commit/74684a7d10009ef970d7d674d9c695b695c5da5c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20355 **[Test build #86552 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86552/testReport)** for PR 20355 at commit [`740def4`](https://github.com/apache/spark/commit/740def4c9a96a7dba5a8f57c49042dee661608b6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20360 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/159/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20360 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20360: [SPARK-23177][SQL][PySpark] Extract zero-paramete...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20360#discussion_r163410447 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala --- @@ -45,7 +45,8 @@ object ExtractPythonUDFFromAggregate extends Rule[LogicalPlan] { private def hasPythonUdfOverAggregate(expr: Expression, agg: Aggregate): Boolean = { expr.find { - e => PythonUDF.isScalarPythonUDF(e) && e.find(belongAggregate(_, agg)).isDefined --- End diff -- Yes. Updated. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20372: Improved block merging logic for partitions
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20372 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20372: Improved block merging logic for partitions
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20372 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20372: Improved block merging logic for partitions
GitHub user glentakahashi opened a pull request: https://github.com/apache/spark/pull/20372 Improved block merging logic for partitions ## What changes were proposed in this pull request? Change DataSourceScanExec so that when grouping blocks together into partitions, also checks the end of the sorted list of splits to more efficiently fill out partitions. ## How was this patch tested? Updated old test to reflect the new logic, which causes the # of partitions to drop from 4 -> 3 You can merge this pull request into a Git repository by running: $ git pull https://github.com/glentakahashi/spark feature/improved-block-merging Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20372.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20372 commit c575977a5952bf50b605be8079c9be1e30f3bd36 Author: Glen TakahashiDate: 2018-01-23T23:22:34Z Improved block merging logic for partitions --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20368: [SPARK-23195] [SQL] Keep the Hint of Cached Data
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20368#discussion_r163404464 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala --- @@ -63,7 +63,7 @@ case class InMemoryRelation( tableName: Option[String])( @transient var _cachedColumnBuffers: RDD[CachedBatch] = null, val batchStats: LongAccumulator = child.sqlContext.sparkContext.longAccumulator, -statsOfPlanToCache: Statistics = null) +statsOfPlanToCache: Statistics) --- End diff -- Setting `null` by default is risky, because we might hit `NullPointerException `. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20365: [SPARK-23192] [SQL] Keep the Hint after Using Cac...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20365 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20370: Changing JDBC relation to better process quotes
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20370 Hi, @conorbmurphy . Could you add a test case for your contribution, too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20365: [SPARK-23192] [SQL] Keep the Hint after Using Cached Dat...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20365 Thanks! Merged to master/2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20370: Changing JDBC relation to better process quotes
Github user conorbmurphy commented on the issue: https://github.com/apache/spark/pull/20370 Will do! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20365: [SPARK-23192] [SQL] Keep the Hint after Using Cached Dat...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20365 Since the last change is just to change the test case name, I merge this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20371: [SPARK-23197][DStreams] Increased timeouts to resolve fl...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20371 @sameeragarwal this PR should fix this flakiness. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20370: Changing JDBC relation to better process quotes
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20370 @conorbmurphy Could you create a JIRA and follow [the instruction](https://spark.apache.org/contributing.html) to make a contribution? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20370: Changing JDBC relation to better process quotes
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20370#discussion_r163403070 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala --- @@ -78,7 +78,8 @@ private[sql] object JDBCRelation extends Logging { // Overflow and silliness can happen if you subtract then divide. // Here we get a little roundoff, but that's (hopefully) OK. val stride: Long = upperBound / numPartitions - lowerBound / numPartitions -val column = partitioning.column +val dialect = JdbcDialects.get(jdbcOptions.url) +val column = dialect.quoteIdentifier(partitioning.column) --- End diff -- We also need to add a test case in `PostgresIntegrationSuite` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20371: [SPARK-23197][DStreams] Increased timeouts to resolve fl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20371 **[Test build #86550 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86550/testReport)** for PR 20371 at commit [`2446aa0`](https://github.com/apache/spark/commit/2446aa070efe43a6ab0d8adbecca335e94896a0b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20370: Changing JDBC relation to better process quotes
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20370#discussion_r163402979 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala --- @@ -78,7 +78,8 @@ private[sql] object JDBCRelation extends Logging { // Overflow and silliness can happen if you subtract then divide. // Here we get a little roundoff, but that's (hopefully) OK. val stride: Long = upperBound / numPartitions - lowerBound / numPartitions -val column = partitioning.column +val dialect = JdbcDialects.get(jdbcOptions.url) +val column = dialect.quoteIdentifier(partitioning.column) --- End diff -- We should do it in `class JDBCOptions`. To avoid breaking the behavior, we should eat the quotes if users manually specify them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20371: [SPARK-23197][DStreams] Increased timeouts to resolve fl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20371 **[Test build #4075 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4075/testReport)** for PR 20371 at commit [`2446aa0`](https://github.com/apache/spark/commit/2446aa070efe43a6ab0d8adbecca335e94896a0b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20370: Changing JDBC relation to better process quotes
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20370 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20371: [SPARK-23197][DStreams] Increased timeouts to resolve fl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20371 **[Test build #4076 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4076/testReport)** for PR 20371 at commit [`2446aa0`](https://github.com/apache/spark/commit/2446aa070efe43a6ab0d8adbecca335e94896a0b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20371: [SPARK-23197][DStreams] Increased timeouts to resolve fl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20371 **[Test build #4074 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4074/testReport)** for PR 20371 at commit [`2446aa0`](https://github.com/apache/spark/commit/2446aa070efe43a6ab0d8adbecca335e94896a0b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20371: [SPARK-23197][DStreams] Increased timeouts to resolve fl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20371 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/158/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20371: [SPARK-23197][DStreams] Increased timeouts to resolve fl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20371 **[Test build #4073 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4073/testReport)** for PR 20371 at commit [`2446aa0`](https://github.com/apache/spark/commit/2446aa070efe43a6ab0d8adbecca335e94896a0b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20371: [SPARK-23197][DStreams] Increased timeouts to resolve fl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20371 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20371: [SPARK-23197][DStreams] Increased timeouts to resolve fl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20371 **[Test build #4072 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4072/testReport)** for PR 20371 at commit [`2446aa0`](https://github.com/apache/spark/commit/2446aa070efe43a6ab0d8adbecca335e94896a0b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20370: Changing JDBC relation to better process quotes
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20370 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20371: [SPARK-23197][DStreams] Increased timeouts to res...
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/20371 [SPARK-23197][DStreams] Increased timeouts to resolve flakiness ## What changes were proposed in this pull request? Increased timeout from 50 ms to 300 ms (50 ms was really too low). ## How was this patch tested? Multiple rounds of tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tdas/spark SPARK-23197 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20371.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20371 commit 2446aa070efe43a6ab0d8adbecca335e94896a0b Author: Tathagata DasDate: 2018-01-23T22:49:20Z increased timeouts --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20365: [SPARK-23192] [SQL] Keep the Hint after Using Cached Dat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20365 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20365: [SPARK-23192] [SQL] Keep the Hint after Using Cached Dat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20365 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86544/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20370: Changing JDBC relation to better process quotes
GitHub user conorbmurphy opened a pull request: https://github.com/apache/spark/pull/20370 Changing JDBC relation to better process quotes ## What changes were proposed in this pull request? The way JDBC writes currently work, they do not properly account for mixed case column names. Instead, the user has to use quotes on each column name. This change avoids that. ## How was this patch tested? Manual tests and working with @dougbateman and @gatorsmile Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/conorbmurphy/spark-1 master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20370.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20370 commit d2864d06c039cb0c0b0c9d9271c9757309017e1a Author: conorbmurphyDate: 2018-01-23T22:43:32Z Changing JDBC relation to better process quotes --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20368: [SPARK-23195] [SQL] Keep the Hint of Cached Data
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/20368#discussion_r163402210 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala --- @@ -63,7 +63,7 @@ case class InMemoryRelation( tableName: Option[String])( @transient var _cachedColumnBuffers: RDD[CachedBatch] = null, val batchStats: LongAccumulator = child.sqlContext.sparkContext.longAccumulator, -statsOfPlanToCache: Statistics = null) +statsOfPlanToCache: Statistics) --- End diff -- leave no default value is fine, we do not any default value actually --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20365: [SPARK-23192] [SQL] Keep the Hint after Using Cached Dat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20365 **[Test build #86544 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86544/testReport)** for PR 20365 at commit [`1186ef5`](https://github.com/apache/spark/commit/1186ef5e38a34ff77fa62521de0da73666b0de96). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r163401858 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala --- @@ -60,7 +62,8 @@ case class InMemoryRelation( @transient child: SparkPlan, tableName: Option[String])( @transient var _cachedColumnBuffers: RDD[CachedBatch] = null, -val batchStats: LongAccumulator = child.sqlContext.sparkContext.longAccumulator) +val batchStats: LongAccumulator = child.sqlContext.sparkContext.longAccumulator, +statsOfPlanToCache: Statistics = null) --- End diff -- eh...we do not have other options, it's more like a placeholder, since InMemoryRelation is created by CacheManager through apply() in companion object it's no harm here IMHO --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20369: [SPARK-23196] Unify continuous and microbatch V2 sinks
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20369 **[Test build #86549 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86549/testReport)** for PR 20369 at commit [`d722bbf`](https://github.com/apache/spark/commit/d722bbf2f253dff0b7da0111b4e75529dc591813). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20369: [SPARK-23196] Unify continuous and microbatch V2 sinks
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20369 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20369: [SPARK-23196] Unify continuous and microbatch V2 ...
GitHub user jose-torres opened a pull request: https://github.com/apache/spark/pull/20369 [SPARK-23196] Unify continuous and microbatch V2 sinks ## What changes were proposed in this pull request? Replace streaming V2 sinks with a unified StreamWriteSupport interface, with a shim to use it with microbatch execution. Add a new SQL config to use for disabling V2 sinks, falling back to the V1 sink implementation. ## How was this patch tested? Existing tests, which in the case of Kafka (the only existing continuous V2 sink) now use V2 for microbatch. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jose-torres/spark streaming-sink Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20369.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20369 commit 94c06a5f9a9d88810a43ac66722f58ffa45709f0 Author: Jose TorresDate: 2018-01-23T20:47:44Z change sink commit ee773f4cc7d6cfbb14b40c2e7961386ea2742612 Author: Jose Torres Date: 2018-01-23T21:12:20Z add config commit d722bbf2f253dff0b7da0111b4e75529dc591813 Author: Jose Torres Date: 2018-01-23T22:07:11Z fix internal row --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20285: [SPARK-22735][ML][DOC] Added VectorSizeHint docs and exa...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20285 Hi, @mengxr . Could you resolve the JIRA, too? - https://issues.apache.org/jira/browse/SPARK-22735 Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20285: [SPARK-22735][ML][DOC] Added VectorSizeHint docs ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20285 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20285: [SPARK-22735][ML][DOC] Added VectorSizeHint docs and exa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20285 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20285: [SPARK-22735][ML][DOC] Added VectorSizeHint docs and exa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20285 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86548/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20285: [SPARK-22735][ML][DOC] Added VectorSizeHint docs and exa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20285 **[Test build #86548 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86548/testReport)** for PR 20285 at commit [`3055eec`](https://github.com/apache/spark/commit/3055eec72bb71e7fe7d586903fbf8ea57a70fa82). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20285: [SPARK-22735][ML][DOC] Added VectorSizeHint docs and exa...
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/20285 LGTM. Merged into master and branch-2.3. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20285: [SPARK-22735][ML][DOC] Added VectorSizeHint docs ...
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20285#discussion_r163390328 --- Diff: docs/ml-features.md --- @@ -1283,6 +1283,56 @@ for more details on the API. +## VectorSizeHint + +It can sometimes be useful to explicitly specify the size of the vectors for a column of +`VectorType`. For example, `VectorAssembler` uses size information from its input columns to +produce size information and metadata for its output column. While in some cases this information +can be obtained by inspecting the contents of the column, in a streaming dataframe the contents are +not available until the stream is started. `VectorSizeHint` allows a user to explicitly specify the +vector size for a column so that `VectorAssembler`, or other transformers that might +need to know vector size, can use that column as an input. + +To use `VectorSizeHint` a user must set the `inputCol` and `size` parameters. Applying this +transformer to a dataframe produces a new dataframe with updated metadata for `inputCol` specifying +the vector size. Downstream operations on the resulting dataframe can get this size using the +meatadata. + +`VectorSizeHint` can also take an optional `handleInvalid` parameter which controls its +behaviour when the vector column contains nulls or vectors of the wrong size. By default +`handleInvalid` is set to "error", indicating an exception should be thrown. This parameter can +also be set to "skip", indicating that rows containing invalid values should be filtered out from +the resulting dataframe, or "optimistic" indicating that all rows should be kept. When +`handleInvalid` is set to "optimistic" the user takes responsibility for ensuring that the column +does not have invalid values, values that don't match the column's metadata, or dealing with those +invalid values downstream. + + + + +Refer to the [VectorSizeHint Scala docs](api/scala/index.html#org.apache.spark.ml.feature.VectorSizeHint) --- End diff -- https://user-images.githubusercontent.com/223219/35302985-523f-0045-11e8-9a21-c4ed795b6e6a.png;> I don't think so :), but I think we should leave it to be consistent with other examples. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20285: [SPARK-22735][ML][DOC] Added VectorSizeHint docs and exa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20285 **[Test build #86548 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86548/testReport)** for PR 20285 at commit [`3055eec`](https://github.com/apache/spark/commit/3055eec72bb71e7fe7d586903fbf8ea57a70fa82). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20285: [SPARK-22735][ML][DOC] Added VectorSizeHint docs and exa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20285 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/157/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20285: [SPARK-22735][ML][DOC] Added VectorSizeHint docs and exa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20285 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20285: [SPARK-22735][ML][DOC] Added VectorSizeHint docs ...
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20285#discussion_r163389908 --- Diff: docs/ml-features.md --- @@ -1283,6 +1283,56 @@ for more details on the API. +## VectorSizeHint + +It can sometimes be useful to explicitly specify the size of the vectors for a column of +`VectorType`. For example, `VectorAssembler` uses size information from its input columns to +produce size information and metadata for its output column. While in some cases this information +can be obtained by inspecting the contents of the column, in a streaming dataframe the contents are +not available until the stream is started. `VectorSizeHint` allows a user to explicitly specify the +vector size for a column so that `VectorAssembler`, or other transformers that might +need to know vector size, can use that column as an input. + +To use `VectorSizeHint` a user must set the `inputCol` and `size` parameters. Applying this +transformer to a dataframe produces a new dataframe with updated metadata for `inputCol` specifying +the vector size. Downstream operations on the resulting dataframe can get this size using the +meatadata. + +`VectorSizeHint` can also take an optional `handleInvalid` parameter which controls its +behaviour when the vector column contains nulls or vectors of the wrong size. By default +`handleInvalid` is set to "error", indicating an exception should be thrown. This parameter can +also be set to "skip", indicating that rows containing invalid values should be filtered out from +the resulting dataframe, or "optimistic" indicating that all rows should be kept. When +`handleInvalid` is set to "optimistic" the user takes responsibility for ensuring that the column --- End diff -- I've updated it, let me know if you think we can still make it more clear. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20335: [SPARK-23088][CORE] History server not showing incomplet...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/20335 This is a behavior change, may be useful under your cases, but we have to make sure it doesn't cause any regressions in other scenarios. cc @gengliangwang --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20361: [SPARK-23188][SQL] Make vectorized columar reader batch ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20361 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/156/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20361: [SPARK-23188][SQL] Make vectorized columar reader batch ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20361 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20368: [SPARK-23195] [SQL] Keep the Hint of Cached Data
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/20368 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20361: [SPARK-23188][SQL] Make vectorized columar reader batch ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20361 **[Test build #86547 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86547/testReport)** for PR 20361 at commit [`38debd7`](https://github.com/apache/spark/commit/38debd7957fc2376b92cac5ae6ad1b0b78fb33c2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20361: [SPARK-23188][SQL] Make vectorized columar reader batch ...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/20361 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20361: [SPARK-23188][SQL] Make vectorized columar reader batch ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20361 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86542/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20361: [SPARK-23188][SQL] Make vectorized columar reader batch ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20361 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20361: [SPARK-23188][SQL] Make vectorized columar reader batch ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20361 **[Test build #86542 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86542/testReport)** for PR 20361 at commit [`38debd7`](https://github.com/apache/spark/commit/38debd7957fc2376b92cac5ae6ad1b0b78fb33c2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org