[jira] [Commented] (SPARK-23299) __repr__ broken for Rows instantiated with *args

2018-02-03 Thread Shashwat Anand (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351670#comment-16351670
 ] 

Shashwat Anand commented on SPARK-23299:


Seems like a valid issue to me.  [~srowen] Should I send a PR for this ?

> __repr__ broken for Rows instantiated with *args
> 
>
> Key: SPARK-23299
> URL: https://issues.apache.org/jira/browse/SPARK-23299
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.5.0, 2.2.0
> Environment: Tested on OS X with Spark 1.5.0 as well as pip-installed 
> `pyspark` 2.2.0. Code in question appears to still be in error on the master 
> branch of the GitHub repository.
>Reporter: Oli Hall
>Priority: Minor
>
> PySpark Rows throw an exception if instantiated without column names when 
> `__repr__` is called. The most minimal reproducible example I've found is 
> this:
> {code:java}
> > from pyspark.sql.types import Row
> > Row(123)
> 
> /lib/python2.7/site-packages/pyspark/sql/types.pyc in 
> __repr__(self)
> -> 1524             return "" % ", ".join(self)
> TypeError: sequence item 0: expected string, int found{code}
> This appears to be due to the implementation of `__repr__`, which works 
> excellently for Rows created with column names, but for those without, 
> assumes all values are strings ([link 
> here|https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L1584]).
> This should be an easy fix, if the values are mapped to `str` first, all 
> should be well (last line is the only modification):
> {code:java}
> def __repr__(self):
> """Printable representation of Row used in Python REPL."""
> if hasattr(self, "__fields__"):
> return "Row(%s)" % ", ".join("%s=%r" % (k, v)
>  for k, v in zip(self.__fields__, 
> tuple(self)))
> else:
> "" % ", ".join(map(str, self))
> {code}
> This will yield the following:
> {code:java}
> > from pyspark.sql.types import Row
> > Row('aaa', 123)
> 
> {code}
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23330) Spark UI SQL executions page throws NPE

2018-02-03 Thread Jiang Xingbo (JIRA)
Jiang Xingbo created SPARK-23330:


 Summary: Spark UI SQL executions page throws NPE
 Key: SPARK-23330
 URL: https://issues.apache.org/jira/browse/SPARK-23330
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.3.0
Reporter: Jiang Xingbo


Spark UI SQL executions page throws the following error and the page crashes:
```
HTTP ERROR 500
Problem accessing /SQL/. Reason:

Server Error
Caused by:
java.lang.NullPointerException
at 
scala.collection.immutable.StringOps$.length$extension(StringOps.scala:47)
at scala.collection.immutable.StringOps.length(StringOps.scala:47)
at 
scala.collection.IndexedSeqOptimized$class.isEmpty(IndexedSeqOptimized.scala:27)
at scala.collection.immutable.StringOps.isEmpty(StringOps.scala:29)
at 
scala.collection.TraversableOnce$class.nonEmpty(TraversableOnce.scala:111)
at scala.collection.immutable.StringOps.nonEmpty(StringOps.scala:29)
at 
org.apache.spark.sql.execution.ui.ExecutionTable.descriptionCell(AllExecutionsPage.scala:182)
at 
org.apache.spark.sql.execution.ui.ExecutionTable.row(AllExecutionsPage.scala:155)
at 
org.apache.spark.sql.execution.ui.ExecutionTable$$anonfun$8.apply(AllExecutionsPage.scala:204)
at 
org.apache.spark.sql.execution.ui.ExecutionTable$$anonfun$8.apply(AllExecutionsPage.scala:204)
at 
org.apache.spark.ui.UIUtils$$anonfun$listingTable$2.apply(UIUtils.scala:339)
at 
org.apache.spark.ui.UIUtils$$anonfun$listingTable$2.apply(UIUtils.scala:339)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.ui.UIUtils$.listingTable(UIUtils.scala:339)
at 
org.apache.spark.sql.execution.ui.ExecutionTable.toNodeSeq(AllExecutionsPage.scala:203)
at 
org.apache.spark.sql.execution.ui.AllExecutionsPage.render(AllExecutionsPage.scala:67)
at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)
```
Seems the bug is imported by 
https://github.com/apache/spark/pull/19681/files#diff-a74d84702d8d47d5269e96740a55a3caR63



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[jira] [Assigned] (SPARK-22430) Unknown tag warnings when building R docs with Roxygen 6.0.1

2018-02-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22430:


Assignee: Apache Spark

> Unknown tag warnings when building R docs with Roxygen 6.0.1
> 
>
> Key: SPARK-22430
> URL: https://issues.apache.org/jira/browse/SPARK-22430
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.3.0
> Environment: Roxygen 6.0.1
>Reporter: Joel Croteau
>Assignee: Apache Spark
>Priority: Trivial
>
> When building R docs using create-rd.sh with Roxygen 6.0.1, a large number of 
> unknown tag warnings are generated:
> {noformat}
> Warning: @export [schema.R#33]: unknown tag
> Warning: @export [schema.R#53]: unknown tag
> Warning: @export [schema.R#63]: unknown tag
> Warning: @export [schema.R#80]: unknown tag
> Warning: @export [schema.R#123]: unknown tag
> Warning: @export [schema.R#141]: unknown tag
> Warning: @export [schema.R#216]: unknown tag
> Warning: @export [generics.R#388]: unknown tag
> Warning: @export [generics.R#403]: unknown tag
> Warning: @export [generics.R#407]: unknown tag
> Warning: @export [generics.R#414]: unknown tag
> Warning: @export [generics.R#418]: unknown tag
> Warning: @export [generics.R#422]: unknown tag
> Warning: @export [generics.R#428]: unknown tag
> Warning: @export [generics.R#432]: unknown tag
> Warning: @export [generics.R#438]: unknown tag
> Warning: @export [generics.R#442]: unknown tag
> Warning: @export [generics.R#446]: unknown tag
> Warning: @export [generics.R#450]: unknown tag
> Warning: @export [generics.R#454]: unknown tag
> Warning: @export [generics.R#459]: unknown tag
> Warning: @export [generics.R#467]: unknown tag
> Warning: @export [generics.R#475]: unknown tag
> Warning: @export [generics.R#479]: unknown tag
> Warning: @export [generics.R#483]: unknown tag
> Warning: @export [generics.R#487]: unknown tag
> Warning: @export [generics.R#498]: unknown tag
> Warning: @export [generics.R#502]: unknown tag
> Warning: @export [generics.R#506]: unknown tag
> Warning: @export [generics.R#512]: unknown tag
> Warning: @export [generics.R#518]: unknown tag
> Warning: @export [generics.R#526]: unknown tag
> Warning: @export [generics.R#530]: unknown tag
> Warning: @export [generics.R#534]: unknown tag
> Warning: @export [generics.R#538]: unknown tag
> Warning: @export [generics.R#542]: unknown tag
> Warning: @export [generics.R#549]: unknown tag
> Warning: @export [generics.R#556]: unknown tag
> Warning: @export [generics.R#560]: unknown tag
> Warning: @export [generics.R#567]: unknown tag
> Warning: @export [generics.R#571]: unknown tag
> Warning: @export [generics.R#575]: unknown tag
> Warning: @export [generics.R#579]: unknown tag
> Warning: @export [generics.R#583]: unknown tag
> Warning: @export [generics.R#587]: unknown tag
> Warning: @export [generics.R#591]: unknown tag
> Warning: @export [generics.R#595]: unknown tag
> Warning: @export [generics.R#599]: unknown tag
> Warning: @export [generics.R#603]: unknown tag
> Warning: @export [generics.R#607]: unknown tag
> Warning: @export [generics.R#611]: unknown tag
> Warning: @export [generics.R#615]: unknown tag
> Warning: @export [generics.R#619]: unknown tag
> Warning: @export [generics.R#623]: unknown tag
> Warning: @export [generics.R#627]: unknown tag
> Warning: @export [generics.R#631]: unknown tag
> Warning: @export [generics.R#635]: unknown tag
> Warning: @export [generics.R#639]: unknown tag
> Warning: @export [generics.R#643]: unknown tag
> Warning: @export [generics.R#647]: unknown tag
> Warning: @export [generics.R#654]: unknown tag
> Warning: @export [generics.R#658]: unknown tag
> Warning: @export [generics.R#663]: unknown tag
> Warning: @export [generics.R#667]: unknown tag
> Warning: @export [generics.R#672]: unknown tag
> Warning: @export [generics.R#676]: unknown tag
> Warning: @export [generics.R#680]: unknown tag
> Warning: @export [generics.R#684]: unknown tag
> Warning: @export [generics.R#690]: unknown tag
> Warning: @export [generics.R#696]: unknown tag
> Warning: @export [generics.R#702]: unknown tag
> Warning: @export [generics.R#706]: unknown tag
> Warning: @export [generics.R#710]: unknown tag
> Warning: @export [generics.R#716]: unknown tag
> Warning: @export [generics.R#720]: unknown tag
> Warning: @export [generics.R#726]: unknown tag
> Warning: @export [generics.R#730]: unknown tag
> Warning: @export [generics.R#734]: unknown tag
> Warning: @export [generics.R#738]: unknown tag
> Warning: @export [generics.R#742]: unknown tag
> Warning: @export [generics.R#750]: unknown tag
> Warning: @export [generics.R#754]: unknown tag
> Warning: @export [generics.R#758]: unknown tag
> Warning: @export [generics.R#766]: unknown tag
> Warning: @export [generics.R#770]: 

[jira] [Commented] (SPARK-22430) Unknown tag warnings when building R docs with Roxygen 6.0.1

2018-02-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351622#comment-16351622
 ] 

Apache Spark commented on SPARK-22430:
--

User 'rekhajoshm' has created a pull request for this issue:
https://github.com/apache/spark/pull/20501

> Unknown tag warnings when building R docs with Roxygen 6.0.1
> 
>
> Key: SPARK-22430
> URL: https://issues.apache.org/jira/browse/SPARK-22430
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.3.0
> Environment: Roxygen 6.0.1
>Reporter: Joel Croteau
>Priority: Trivial
>
> When building R docs using create-rd.sh with Roxygen 6.0.1, a large number of 
> unknown tag warnings are generated:
> {noformat}
> Warning: @export [schema.R#33]: unknown tag
> Warning: @export [schema.R#53]: unknown tag
> Warning: @export [schema.R#63]: unknown tag
> Warning: @export [schema.R#80]: unknown tag
> Warning: @export [schema.R#123]: unknown tag
> Warning: @export [schema.R#141]: unknown tag
> Warning: @export [schema.R#216]: unknown tag
> Warning: @export [generics.R#388]: unknown tag
> Warning: @export [generics.R#403]: unknown tag
> Warning: @export [generics.R#407]: unknown tag
> Warning: @export [generics.R#414]: unknown tag
> Warning: @export [generics.R#418]: unknown tag
> Warning: @export [generics.R#422]: unknown tag
> Warning: @export [generics.R#428]: unknown tag
> Warning: @export [generics.R#432]: unknown tag
> Warning: @export [generics.R#438]: unknown tag
> Warning: @export [generics.R#442]: unknown tag
> Warning: @export [generics.R#446]: unknown tag
> Warning: @export [generics.R#450]: unknown tag
> Warning: @export [generics.R#454]: unknown tag
> Warning: @export [generics.R#459]: unknown tag
> Warning: @export [generics.R#467]: unknown tag
> Warning: @export [generics.R#475]: unknown tag
> Warning: @export [generics.R#479]: unknown tag
> Warning: @export [generics.R#483]: unknown tag
> Warning: @export [generics.R#487]: unknown tag
> Warning: @export [generics.R#498]: unknown tag
> Warning: @export [generics.R#502]: unknown tag
> Warning: @export [generics.R#506]: unknown tag
> Warning: @export [generics.R#512]: unknown tag
> Warning: @export [generics.R#518]: unknown tag
> Warning: @export [generics.R#526]: unknown tag
> Warning: @export [generics.R#530]: unknown tag
> Warning: @export [generics.R#534]: unknown tag
> Warning: @export [generics.R#538]: unknown tag
> Warning: @export [generics.R#542]: unknown tag
> Warning: @export [generics.R#549]: unknown tag
> Warning: @export [generics.R#556]: unknown tag
> Warning: @export [generics.R#560]: unknown tag
> Warning: @export [generics.R#567]: unknown tag
> Warning: @export [generics.R#571]: unknown tag
> Warning: @export [generics.R#575]: unknown tag
> Warning: @export [generics.R#579]: unknown tag
> Warning: @export [generics.R#583]: unknown tag
> Warning: @export [generics.R#587]: unknown tag
> Warning: @export [generics.R#591]: unknown tag
> Warning: @export [generics.R#595]: unknown tag
> Warning: @export [generics.R#599]: unknown tag
> Warning: @export [generics.R#603]: unknown tag
> Warning: @export [generics.R#607]: unknown tag
> Warning: @export [generics.R#611]: unknown tag
> Warning: @export [generics.R#615]: unknown tag
> Warning: @export [generics.R#619]: unknown tag
> Warning: @export [generics.R#623]: unknown tag
> Warning: @export [generics.R#627]: unknown tag
> Warning: @export [generics.R#631]: unknown tag
> Warning: @export [generics.R#635]: unknown tag
> Warning: @export [generics.R#639]: unknown tag
> Warning: @export [generics.R#643]: unknown tag
> Warning: @export [generics.R#647]: unknown tag
> Warning: @export [generics.R#654]: unknown tag
> Warning: @export [generics.R#658]: unknown tag
> Warning: @export [generics.R#663]: unknown tag
> Warning: @export [generics.R#667]: unknown tag
> Warning: @export [generics.R#672]: unknown tag
> Warning: @export [generics.R#676]: unknown tag
> Warning: @export [generics.R#680]: unknown tag
> Warning: @export [generics.R#684]: unknown tag
> Warning: @export [generics.R#690]: unknown tag
> Warning: @export [generics.R#696]: unknown tag
> Warning: @export [generics.R#702]: unknown tag
> Warning: @export [generics.R#706]: unknown tag
> Warning: @export [generics.R#710]: unknown tag
> Warning: @export [generics.R#716]: unknown tag
> Warning: @export [generics.R#720]: unknown tag
> Warning: @export [generics.R#726]: unknown tag
> Warning: @export [generics.R#730]: unknown tag
> Warning: @export [generics.R#734]: unknown tag
> Warning: @export [generics.R#738]: unknown tag
> Warning: @export [generics.R#742]: unknown tag
> Warning: @export [generics.R#750]: unknown tag
> Warning: @export [generics.R#754]: unknown tag
> Warning: @export [generics.R#758]: unknown tag
> Warning: 

[jira] [Assigned] (SPARK-22430) Unknown tag warnings when building R docs with Roxygen 6.0.1

2018-02-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22430:


Assignee: (was: Apache Spark)

> Unknown tag warnings when building R docs with Roxygen 6.0.1
> 
>
> Key: SPARK-22430
> URL: https://issues.apache.org/jira/browse/SPARK-22430
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.3.0
> Environment: Roxygen 6.0.1
>Reporter: Joel Croteau
>Priority: Trivial
>
> When building R docs using create-rd.sh with Roxygen 6.0.1, a large number of 
> unknown tag warnings are generated:
> {noformat}
> Warning: @export [schema.R#33]: unknown tag
> Warning: @export [schema.R#53]: unknown tag
> Warning: @export [schema.R#63]: unknown tag
> Warning: @export [schema.R#80]: unknown tag
> Warning: @export [schema.R#123]: unknown tag
> Warning: @export [schema.R#141]: unknown tag
> Warning: @export [schema.R#216]: unknown tag
> Warning: @export [generics.R#388]: unknown tag
> Warning: @export [generics.R#403]: unknown tag
> Warning: @export [generics.R#407]: unknown tag
> Warning: @export [generics.R#414]: unknown tag
> Warning: @export [generics.R#418]: unknown tag
> Warning: @export [generics.R#422]: unknown tag
> Warning: @export [generics.R#428]: unknown tag
> Warning: @export [generics.R#432]: unknown tag
> Warning: @export [generics.R#438]: unknown tag
> Warning: @export [generics.R#442]: unknown tag
> Warning: @export [generics.R#446]: unknown tag
> Warning: @export [generics.R#450]: unknown tag
> Warning: @export [generics.R#454]: unknown tag
> Warning: @export [generics.R#459]: unknown tag
> Warning: @export [generics.R#467]: unknown tag
> Warning: @export [generics.R#475]: unknown tag
> Warning: @export [generics.R#479]: unknown tag
> Warning: @export [generics.R#483]: unknown tag
> Warning: @export [generics.R#487]: unknown tag
> Warning: @export [generics.R#498]: unknown tag
> Warning: @export [generics.R#502]: unknown tag
> Warning: @export [generics.R#506]: unknown tag
> Warning: @export [generics.R#512]: unknown tag
> Warning: @export [generics.R#518]: unknown tag
> Warning: @export [generics.R#526]: unknown tag
> Warning: @export [generics.R#530]: unknown tag
> Warning: @export [generics.R#534]: unknown tag
> Warning: @export [generics.R#538]: unknown tag
> Warning: @export [generics.R#542]: unknown tag
> Warning: @export [generics.R#549]: unknown tag
> Warning: @export [generics.R#556]: unknown tag
> Warning: @export [generics.R#560]: unknown tag
> Warning: @export [generics.R#567]: unknown tag
> Warning: @export [generics.R#571]: unknown tag
> Warning: @export [generics.R#575]: unknown tag
> Warning: @export [generics.R#579]: unknown tag
> Warning: @export [generics.R#583]: unknown tag
> Warning: @export [generics.R#587]: unknown tag
> Warning: @export [generics.R#591]: unknown tag
> Warning: @export [generics.R#595]: unknown tag
> Warning: @export [generics.R#599]: unknown tag
> Warning: @export [generics.R#603]: unknown tag
> Warning: @export [generics.R#607]: unknown tag
> Warning: @export [generics.R#611]: unknown tag
> Warning: @export [generics.R#615]: unknown tag
> Warning: @export [generics.R#619]: unknown tag
> Warning: @export [generics.R#623]: unknown tag
> Warning: @export [generics.R#627]: unknown tag
> Warning: @export [generics.R#631]: unknown tag
> Warning: @export [generics.R#635]: unknown tag
> Warning: @export [generics.R#639]: unknown tag
> Warning: @export [generics.R#643]: unknown tag
> Warning: @export [generics.R#647]: unknown tag
> Warning: @export [generics.R#654]: unknown tag
> Warning: @export [generics.R#658]: unknown tag
> Warning: @export [generics.R#663]: unknown tag
> Warning: @export [generics.R#667]: unknown tag
> Warning: @export [generics.R#672]: unknown tag
> Warning: @export [generics.R#676]: unknown tag
> Warning: @export [generics.R#680]: unknown tag
> Warning: @export [generics.R#684]: unknown tag
> Warning: @export [generics.R#690]: unknown tag
> Warning: @export [generics.R#696]: unknown tag
> Warning: @export [generics.R#702]: unknown tag
> Warning: @export [generics.R#706]: unknown tag
> Warning: @export [generics.R#710]: unknown tag
> Warning: @export [generics.R#716]: unknown tag
> Warning: @export [generics.R#720]: unknown tag
> Warning: @export [generics.R#726]: unknown tag
> Warning: @export [generics.R#730]: unknown tag
> Warning: @export [generics.R#734]: unknown tag
> Warning: @export [generics.R#738]: unknown tag
> Warning: @export [generics.R#742]: unknown tag
> Warning: @export [generics.R#750]: unknown tag
> Warning: @export [generics.R#754]: unknown tag
> Warning: @export [generics.R#758]: unknown tag
> Warning: @export [generics.R#766]: unknown tag
> Warning: @export [generics.R#770]: unknown tag
> Warning: 

[jira] [Commented] (SPARK-22711) _pickle.PicklingError: args[0] from __newobj__ args has the wrong class from cloudpickle.py

2018-02-03 Thread Prateek (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351601#comment-16351601
 ] 

Prateek commented on SPARK-22711:
-

Thanks . work around works fine.

I wonder why *setup_environment* method did not take care of this. Is it that 
each pass of map or flatMap master assigns to worker nodes and worker nodes are 
kind of refreshed(setup RDD was temporary) or is it that it assigns to 
different worker node other it did for than setup_environment.

> _pickle.PicklingError: args[0] from __newobj__ args has the wrong class from 
> cloudpickle.py
> ---
>
> Key: SPARK-22711
> URL: https://issues.apache.org/jira/browse/SPARK-22711
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Submit
>Affects Versions: 2.2.0, 2.2.1
> Environment: Ubuntu pseudo distributed installation of Spark 2.2.0
>Reporter: Prateek
>Priority: Major
> Attachments: Jira_Spark_minimized_code.py
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> When I submit a Pyspark program with spark-submit command this error is 
> thrown.
> It happens when for code like below
> RDD2 = RDD1.map(lambda m: function_x(m)).reduceByKey(lambda c,v :c+v)
> or 
> RDD2 = RDD1.flatMap(lambda m: function_x(m)).reduceByKey(lambda c,v :c+v)
> or
> RDD2 = RDD1.flatMap(lambda m: function_x(m)).reduce(lambda c,v :c+v)
> Traceback (most recent call last):
>   File "/home/prateek/Project/textrank.py", line 299, in 
> summaryRDD = sentenceTokensReduceRDD.map(lambda m: 
> get_summary(m)).reduceByKey(lambda c,v :c+v)
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1608, 
> in reduceByKey
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1846, 
> in combineByKey
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1783, 
> in partitionBy
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2455, 
> in _jrdd
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2388, 
> in _wrap_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2374, 
> in _prepare_for_python_RDD
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 
> 460, in dumps
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 704, in dumps
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 148, in dump
>   File "/usr/lib/python3.5/pickle.py", line 408, in dump
> self.save(obj)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 740, in save_tuple
> save(element)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 255, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 292, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 725, in save_tuple
> save(element)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 770, in save_list
> self._batch_appends(obj)
>   File "/usr/lib/python3.5/pickle.py", line 794, in _batch_appends
> save(x)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 255, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 292, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 725, in save_tuple
> save(element)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 770, in save_list
> self._batch_appends(obj)
>   File "/usr/lib/python3.5/pickle.py", line 794, in _batch_appends
> save(x)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 255, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 292, in save_function_tuple
>   File 

[jira] [Commented] (SPARK-14023) Make exceptions consistent regarding fields and columns

2018-02-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351598#comment-16351598
 ] 

Apache Spark commented on SPARK-14023:
--

User 'rekhajoshm' has created a pull request for this issue:
https://github.com/apache/spark/pull/20500

> Make exceptions consistent regarding fields and columns
> ---
>
> Key: SPARK-14023
> URL: https://issues.apache.org/jira/browse/SPARK-14023
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Priority: Trivial
>
> As you can see below, a column is called a field depending on where an 
> exception is thrown. I think it should be "column" everywhere (since that's 
> what has a type from a schema).
> {code}
> scala> lr
> res32: org.apache.spark.ml.regression.LinearRegression = linReg_d9bfe808e743
> scala> lr.fit(ds)
> java.lang.IllegalArgumentException: Field "features" does not exist.
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:214)
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:214)
>   at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
>   at scala.collection.AbstractMap.getOrElse(Map.scala:59)
>   at org.apache.spark.sql.types.StructType.apply(StructType.scala:213)
>   at 
> org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:40)
>   at 
> org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(Predictor.scala:50)
>   at 
> org.apache.spark.ml.Predictor.validateAndTransformSchema(Predictor.scala:71)
>   at org.apache.spark.ml.Predictor.transformSchema(Predictor.scala:116)
>   at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67)
>   at org.apache.spark.ml.Predictor.fit(Predictor.scala:89)
>   ... 51 elided
> scala> lr.fit(ds)
> java.lang.IllegalArgumentException: requirement failed: Column label must be 
> of type DoubleType but was actually StringType.
>   at scala.Predef$.require(Predef.scala:219)
>   at 
> org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:42)
>   at 
> org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(Predictor.scala:53)
>   at 
> org.apache.spark.ml.Predictor.validateAndTransformSchema(Predictor.scala:71)
>   at org.apache.spark.ml.Predictor.transformSchema(Predictor.scala:116)
>   at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67)
>   at org.apache.spark.ml.Predictor.fit(Predictor.scala:89)
>   ... 51 elided
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14023) Make exceptions consistent regarding fields and columns

2018-02-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14023:


Assignee: (was: Apache Spark)

> Make exceptions consistent regarding fields and columns
> ---
>
> Key: SPARK-14023
> URL: https://issues.apache.org/jira/browse/SPARK-14023
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Priority: Trivial
>
> As you can see below, a column is called a field depending on where an 
> exception is thrown. I think it should be "column" everywhere (since that's 
> what has a type from a schema).
> {code}
> scala> lr
> res32: org.apache.spark.ml.regression.LinearRegression = linReg_d9bfe808e743
> scala> lr.fit(ds)
> java.lang.IllegalArgumentException: Field "features" does not exist.
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:214)
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:214)
>   at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
>   at scala.collection.AbstractMap.getOrElse(Map.scala:59)
>   at org.apache.spark.sql.types.StructType.apply(StructType.scala:213)
>   at 
> org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:40)
>   at 
> org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(Predictor.scala:50)
>   at 
> org.apache.spark.ml.Predictor.validateAndTransformSchema(Predictor.scala:71)
>   at org.apache.spark.ml.Predictor.transformSchema(Predictor.scala:116)
>   at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67)
>   at org.apache.spark.ml.Predictor.fit(Predictor.scala:89)
>   ... 51 elided
> scala> lr.fit(ds)
> java.lang.IllegalArgumentException: requirement failed: Column label must be 
> of type DoubleType but was actually StringType.
>   at scala.Predef$.require(Predef.scala:219)
>   at 
> org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:42)
>   at 
> org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(Predictor.scala:53)
>   at 
> org.apache.spark.ml.Predictor.validateAndTransformSchema(Predictor.scala:71)
>   at org.apache.spark.ml.Predictor.transformSchema(Predictor.scala:116)
>   at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67)
>   at org.apache.spark.ml.Predictor.fit(Predictor.scala:89)
>   ... 51 elided
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14023) Make exceptions consistent regarding fields and columns

2018-02-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14023:


Assignee: Apache Spark

> Make exceptions consistent regarding fields and columns
> ---
>
> Key: SPARK-14023
> URL: https://issues.apache.org/jira/browse/SPARK-14023
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Assignee: Apache Spark
>Priority: Trivial
>
> As you can see below, a column is called a field depending on where an 
> exception is thrown. I think it should be "column" everywhere (since that's 
> what has a type from a schema).
> {code}
> scala> lr
> res32: org.apache.spark.ml.regression.LinearRegression = linReg_d9bfe808e743
> scala> lr.fit(ds)
> java.lang.IllegalArgumentException: Field "features" does not exist.
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:214)
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:214)
>   at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
>   at scala.collection.AbstractMap.getOrElse(Map.scala:59)
>   at org.apache.spark.sql.types.StructType.apply(StructType.scala:213)
>   at 
> org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:40)
>   at 
> org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(Predictor.scala:50)
>   at 
> org.apache.spark.ml.Predictor.validateAndTransformSchema(Predictor.scala:71)
>   at org.apache.spark.ml.Predictor.transformSchema(Predictor.scala:116)
>   at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67)
>   at org.apache.spark.ml.Predictor.fit(Predictor.scala:89)
>   ... 51 elided
> scala> lr.fit(ds)
> java.lang.IllegalArgumentException: requirement failed: Column label must be 
> of type DoubleType but was actually StringType.
>   at scala.Predef$.require(Predef.scala:219)
>   at 
> org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:42)
>   at 
> org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(Predictor.scala:53)
>   at 
> org.apache.spark.ml.Predictor.validateAndTransformSchema(Predictor.scala:71)
>   at org.apache.spark.ml.Predictor.transformSchema(Predictor.scala:116)
>   at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67)
>   at org.apache.spark.ml.Predictor.fit(Predictor.scala:89)
>   ... 51 elided
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23329) Update the function descriptions with the arguments and returned values of the trigonometric functions

2018-02-03 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-23329:
--
  Labels: starter  (was: )
Priority: Minor  (was: Major)

Agree. At the least, this should say that the argument is in radians. For cases 
where the semantics are more complicated, like tanh, it's probably fine to say 
"this works as java.lang.Math does".

> Update the function descriptions with the arguments and returned values of 
> the trigonometric functions
> --
>
> Key: SPARK-23329
> URL: https://issues.apache.org/jira/browse/SPARK-23329
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Minor
>  Labels: starter
>
> We need an update on the function descriptions for all the trigonometric 
> functions. For example, {{cos}}, {{sin}}, and {{cot}}. Internally, the 
> implementation is based on the java.lang.Math. We need a clear description 
> about the units of the input arguments and the returned values. 
> For example, the following descriptions are lacking such info. 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala#L551-L555
> https://github.com/apache/spark/blob/d5861aba9d80ca15ad3f22793b79822e470d6913/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L1978



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21658) Adds the default None for value in na.replace in PySpark to match

2018-02-03 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-21658:
-
Fix Version/s: (was: 2.3.0)

> Adds the default None for value in na.replace in PySpark to match
> -
>
> Key: SPARK-21658
> URL: https://issues.apache.org/jira/browse/SPARK-21658
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Assignee: Chin Han Yu
>Priority: Minor
>  Labels: Starter
>
> Looks {{na.replace}} missed the default value {{None}}.
> Both docs says they are aliases 
> http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.replace
> http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameNaFunctions.replace
> but the default values looks different, which ends up with:
> {code}
> >>> df = spark.createDataFrame([('Alice', 10, 80.0)])
> >>> df.replace({"Alice": "a"}).first()
> Row(_1=u'a', _2=10, _3=80.0)
> >>> df.na.replace({"Alice": "a"}).first()
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: replace() takes at least 3 arguments (2 given)
> {code}
> To take the advantage of SPARK-19454, sounds we should match them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23329) Update the function descriptions with the arguments and returned values of the trigonometric functions

2018-02-03 Thread Xiao Li (JIRA)
Xiao Li created SPARK-23329:
---

 Summary: Update the function descriptions with the arguments and 
returned values of the trigonometric functions
 Key: SPARK-23329
 URL: https://issues.apache.org/jira/browse/SPARK-23329
 Project: Spark
  Issue Type: Documentation
  Components: SQL
Affects Versions: 2.3.0
Reporter: Xiao Li


We need an update on the function descriptions for all the trigonometric 
functions. For example, {{cos}}, {{sin}}, and {{cot}}. Internally, the 
implementation is based on the java.lang.Math. We need a clear description 
about the units of the input arguments and the returned values. 

For example, the following descriptions are lacking such info. 

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala#L551-L555

https://github.com/apache/spark/blob/d5861aba9d80ca15ad3f22793b79822e470d6913/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L1978




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21658) Adds the default None for value in na.replace in PySpark to match

2018-02-03 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-21658.
--
Resolution: Duplicate

> Adds the default None for value in na.replace in PySpark to match
> -
>
> Key: SPARK-21658
> URL: https://issues.apache.org/jira/browse/SPARK-21658
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Assignee: Chin Han Yu
>Priority: Minor
>  Labels: Starter
> Fix For: 2.3.0
>
>
> Looks {{na.replace}} missed the default value {{None}}.
> Both docs says they are aliases 
> http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.replace
> http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameNaFunctions.replace
> but the default values looks different, which ends up with:
> {code}
> >>> df = spark.createDataFrame([('Alice', 10, 80.0)])
> >>> df.replace({"Alice": "a"}).first()
> Row(_1=u'a', _2=10, _3=80.0)
> >>> df.na.replace({"Alice": "a"}).first()
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: replace() takes at least 3 arguments (2 given)
> {code}
> To take the advantage of SPARK-19454, sounds we should match them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-21658) Adds the default None for value in na.replace in PySpark to match

2018-02-03 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-21658:
--

> Adds the default None for value in na.replace in PySpark to match
> -
>
> Key: SPARK-21658
> URL: https://issues.apache.org/jira/browse/SPARK-21658
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Assignee: Chin Han Yu
>Priority: Minor
>  Labels: Starter
> Fix For: 2.3.0
>
>
> Looks {{na.replace}} missed the default value {{None}}.
> Both docs says they are aliases 
> http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.replace
> http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameNaFunctions.replace
> but the default values looks different, which ends up with:
> {code}
> >>> df = spark.createDataFrame([('Alice', 10, 80.0)])
> >>> df.replace({"Alice": "a"}).first()
> Row(_1=u'a', _2=10, _3=80.0)
> >>> df.na.replace({"Alice": "a"}).first()
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: replace() takes at least 3 arguments (2 given)
> {code}
> To take the advantage of SPARK-19454, sounds we should match them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-14047) GBT improvement umbrella

2018-02-03 Thread Clay Stevens (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351488#comment-16351488
 ] 

Clay Stevens edited comment on SPARK-14047 at 2/3/18 7:05 PM:
--

Is there anything that can be done to prevent the "Caused by: 
java.util.NoSuchElementException: key not found: 1.0" error that happens when 
the model test data does not have *_all_* of the feature-values or 
feature-categories that the model training data had.  If it does not, Spark 
completely stops.  

This is similar to the issues -SPARK-12367- and SPARK-12375 for the Random 
Forest Regressor and VectorIndexer.


was (Author: clayms):
Is there anything that can be done to prevent the "Caused by: 
java.util.NoSuchElementException: key not found: 1.0" error that happens when 
the model test data does not have *_all_* of the feature-values or 
feature-categories that the model training data had.  If it does not, Spark 
completely stops.  

This is similar to the issues -SPARK-12367- and SPARK-12375 for the 
VectorIndexer.

> GBT improvement umbrella
> 
>
> Key: SPARK-14047
> URL: https://issues.apache.org/jira/browse/SPARK-14047
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Major
>
> This is an umbrella for improvements to learning Gradient Boosted Trees: 
> GBTClassifier, GBTRegressor.
> Note: Aspects of GBTs which are related to individual trees should be listed 
> under [SPARK-14045].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23328) Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary

2018-02-03 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-23328:
-
Summary: Disallow default value None in na.replace/replace when 
'to_replace' is not a dictionary  (was: Disallow default value None when 
'to_replace' is not a dictionary)

> Disallow default value None in na.replace/replace when 'to_replace' is not a 
> dictionary
> ---
>
> Key: SPARK-23328
> URL: https://issues.apache.org/jira/browse/SPARK-23328
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We happened to set {{None}} as a default value via SPARK-19454 which is quite 
> weird.
> Looks we better only do this when the input is dictionary.
> Please see https://github.com/apache/spark/pull/16793#issuecomment-362684399



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14047) GBT improvement umbrella

2018-02-03 Thread Clay Stevens (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351488#comment-16351488
 ] 

Clay Stevens commented on SPARK-14047:
--

Is there anything that can be done to prevent the "Caused by: 
java.util.NoSuchElementException: key not found: 1.0" error that happens when 
the model test data does not have *_all_* of the feature-values or 
feature-categories that the model training data had.  If it does not, Spark 
completely stops.  

This is similar to the issues -SPARK-12367- and SPARK-12375 for the 
VectorIndexer.

> GBT improvement umbrella
> 
>
> Key: SPARK-14047
> URL: https://issues.apache.org/jira/browse/SPARK-14047
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Major
>
> This is an umbrella for improvements to learning Gradient Boosted Trees: 
> GBTClassifier, GBTRegressor.
> Note: Aspects of GBTs which are related to individual trees should be listed 
> under [SPARK-14045].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21658) Adds the default None for value in na.replace in PySpark to match

2018-02-03 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351486#comment-16351486
 ] 

Hyukjin Kwon commented on SPARK-21658:
--

This JIRA targeted to match the signature with its original to be more clear .. 
SPARK-23328 contains this ..

> Adds the default None for value in na.replace in PySpark to match
> -
>
> Key: SPARK-21658
> URL: https://issues.apache.org/jira/browse/SPARK-21658
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Assignee: Chin Han Yu
>Priority: Minor
>  Labels: Starter
> Fix For: 2.3.0
>
>
> Looks {{na.replace}} missed the default value {{None}}.
> Both docs says they are aliases 
> http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.replace
> http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameNaFunctions.replace
> but the default values looks different, which ends up with:
> {code}
> >>> df = spark.createDataFrame([('Alice', 10, 80.0)])
> >>> df.replace({"Alice": "a"}).first()
> Row(_1=u'a', _2=10, _3=80.0)
> >>> df.na.replace({"Alice": "a"}).first()
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: replace() takes at least 3 arguments (2 given)
> {code}
> To take the advantage of SPARK-19454, sounds we should match them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23328) Disallow default value None when 'to_replace' is not a dictionary

2018-02-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351482#comment-16351482
 ] 

Apache Spark commented on SPARK-23328:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/20499

> Disallow default value None when 'to_replace' is not a dictionary
> -
>
> Key: SPARK-23328
> URL: https://issues.apache.org/jira/browse/SPARK-23328
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We happened to set {{None}} as a default value via SPARK-19454 which is quite 
> weird.
> Looks we better only do this when the input is dictionary.
> Please see https://github.com/apache/spark/pull/16793#issuecomment-362684399



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23328) Disallow default value None when 'to_replace' is not a dictionary

2018-02-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23328:


Assignee: (was: Apache Spark)

> Disallow default value None when 'to_replace' is not a dictionary
> -
>
> Key: SPARK-23328
> URL: https://issues.apache.org/jira/browse/SPARK-23328
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We happened to set {{None}} as a default value via SPARK-19454 which is quite 
> weird.
> Looks we better only do this when the input is dictionary.
> Please see https://github.com/apache/spark/pull/16793#issuecomment-362684399



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23328) Disallow default value None when 'to_replace' is not a dictionary

2018-02-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23328:


Assignee: Apache Spark

> Disallow default value None when 'to_replace' is not a dictionary
> -
>
> Key: SPARK-23328
> URL: https://issues.apache.org/jira/browse/SPARK-23328
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> We happened to set {{None}} as a default value via SPARK-19454 which is quite 
> weird.
> Looks we better only do this when the input is dictionary.
> Please see https://github.com/apache/spark/pull/16793#issuecomment-362684399



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23328) Disallow default value None when 'to_replace' is not a dictionary

2018-02-03 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-23328:


 Summary: Disallow default value None when 'to_replace' is not a 
dictionary
 Key: SPARK-23328
 URL: https://issues.apache.org/jira/browse/SPARK-23328
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
Affects Versions: 2.3.0
Reporter: Hyukjin Kwon


We happened to set {{None}} as a default value via SPARK-19454 which is quite 
weird.
Looks we better only do this when the input is dictionary.
Please see https://github.com/apache/spark/pull/16793#issuecomment-362684399



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-21658) Adds the default None for value in na.replace in PySpark to match

2018-02-03 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reopened SPARK-21658:
-

> Adds the default None for value in na.replace in PySpark to match
> -
>
> Key: SPARK-21658
> URL: https://issues.apache.org/jira/browse/SPARK-21658
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Assignee: Chin Han Yu
>Priority: Minor
>  Labels: Starter
> Fix For: 2.3.0
>
>
> Looks {{na.replace}} missed the default value {{None}}.
> Both docs says they are aliases 
> http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.replace
> http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameNaFunctions.replace
> but the default values looks different, which ends up with:
> {code}
> >>> df = spark.createDataFrame([('Alice', 10, 80.0)])
> >>> df.replace({"Alice": "a"}).first()
> Row(_1=u'a', _2=10, _3=80.0)
> >>> df.na.replace({"Alice": "a"}).first()
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: replace() takes at least 3 arguments (2 given)
> {code}
> To take the advantage of SPARK-19454, sounds we should match them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21658) Adds the default None for value in na.replace in PySpark to match

2018-02-03 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-21658.
-
Resolution: Invalid

> Adds the default None for value in na.replace in PySpark to match
> -
>
> Key: SPARK-21658
> URL: https://issues.apache.org/jira/browse/SPARK-21658
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Assignee: Chin Han Yu
>Priority: Minor
>  Labels: Starter
> Fix For: 2.3.0
>
>
> Looks {{na.replace}} missed the default value {{None}}.
> Both docs says they are aliases 
> http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.replace
> http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameNaFunctions.replace
> but the default values looks different, which ends up with:
> {code}
> >>> df = spark.createDataFrame([('Alice', 10, 80.0)])
> >>> df.replace({"Alice": "a"}).first()
> Row(_1=u'a', _2=10, _3=80.0)
> >>> df.na.replace({"Alice": "a"}).first()
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: replace() takes at least 3 arguments (2 given)
> {code}
> To take the advantage of SPARK-19454, sounds we should match them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19454) Improve DataFrame.replace API

2018-02-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351480#comment-16351480
 ] 

Apache Spark commented on SPARK-19454:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/20499

> Improve DataFrame.replace API
> -
>
> Key: SPARK-19454
> URL: https://issues.apache.org/jira/browse/SPARK-19454
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 1.5.0, 1.6.0, 2.0.0, 2.1.0, 2.2.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Major
> Fix For: 2.2.0
>
>
> Current implementation suffers from following issues:
> - It is possible to use {{dict}} as {{to_replace}}, but we cannot skip or use 
> {{None}} as the value {{value}} (although it is ignored). This requires 
> passing "magic" values:
> {code}
> df = sc.parallelize([("Alice", 1, 3.0)]).toDF()
> df.replace({"Alice": "Bob"}, 1)
> {code}
> - Code doesn't check if provided types are correct. This can lead to 
> exception in Py4j (harder to diagnose):
> {code}
>  df.replace({"Alice": 1}, 1)
> {code}
> or silent failures (with bundled Py4j version):
> {code}
>  df.replace({1: 2, 3.0: 4.1, "a": "b"}, 1)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22036) BigDecimal multiplication sometimes returns null

2018-02-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351366#comment-16351366
 ] 

Apache Spark commented on SPARK-22036:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/20498

> BigDecimal multiplication sometimes returns null
> 
>
> Key: SPARK-22036
> URL: https://issues.apache.org/jira/browse/SPARK-22036
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Olivier Blanvillain
>Assignee: Marco Gaido
>Priority: Major
> Fix For: 2.3.0
>
>
> The multiplication of two BigDecimal numbers sometimes returns null. Here is 
> a minimal reproduction:
> {code:java}
> object Main extends App {
>   import org.apache.spark.{SparkConf, SparkContext}
>   import org.apache.spark.sql.SparkSession
>   import spark.implicits._
>   val conf = new 
> SparkConf().setMaster("local[*]").setAppName("REPL").set("spark.ui.enabled", 
> "false")
>   val spark = 
> SparkSession.builder().config(conf).appName("REPL").getOrCreate()
>   implicit val sqlContext = spark.sqlContext
>   case class X2(a: BigDecimal, b: BigDecimal)
>   val ds = sqlContext.createDataset(List(X2(BigDecimal(-0.1267333984375), 
> BigDecimal(-1000.1
>   val result = ds.select(ds("a") * ds("b")).collect.head
>   println(result) // [null]
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-03 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351333#comment-16351333
 ] 

Felix Cheung commented on SPARK-23314:
--

I've isolated this down to this particular file

[https://raw.githubusercontent.com/BuzzFeedNews/2016-04-federal-surveillance-planes/master/data/feds/feds3.csv]

without converting to pandas it seems to read fine, so not if it's a data 
problem.

> Pandas grouped udf on dataset with timestamp column error 
> --
>
> Key: SPARK-23314
> URL: https://issues.apache.org/jira/browse/SPARK-23314
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Priority: Major
>
> Under  SPARK-22216
> When testing pandas_udf on group bys, I saw this error with the timestamp 
> column.
> File "pandas/_libs/tslib.pyx", line 3593, in 
> pandas._libs.tslib.tz_localize_to_utc
> AmbiguousTimeError: Cannot infer dst time from Timestamp('2015-11-01 
> 01:29:30'), try using the 'ambiguous' argument
> For details, see Comment box. I'm able to reproduce this on the latest 
> branch-2.3 (last change from Feb 1 UTC)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21658) Adds the default None for value in na.replace in PySpark to match

2018-02-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351331#comment-16351331
 ] 

Apache Spark commented on SPARK-21658:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/20496

> Adds the default None for value in na.replace in PySpark to match
> -
>
> Key: SPARK-21658
> URL: https://issues.apache.org/jira/browse/SPARK-21658
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Assignee: Chin Han Yu
>Priority: Minor
>  Labels: Starter
> Fix For: 2.3.0
>
>
> Looks {{na.replace}} missed the default value {{None}}.
> Both docs says they are aliases 
> http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.replace
> http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameNaFunctions.replace
> but the default values looks different, which ends up with:
> {code}
> >>> df = spark.createDataFrame([('Alice', 10, 80.0)])
> >>> df.replace({"Alice": "a"}).first()
> Row(_1=u'a', _2=10, _3=80.0)
> >>> df.na.replace({"Alice": "a"}).first()
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: replace() takes at least 3 arguments (2 given)
> {code}
> To take the advantage of SPARK-19454, sounds we should match them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21658) Adds the default None for value in na.replace in PySpark to match

2018-02-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351326#comment-16351326
 ] 

Apache Spark commented on SPARK-21658:
--

User 'byakuinss' has created a pull request for this issue:
https://github.com/apache/spark/pull/18895

> Adds the default None for value in na.replace in PySpark to match
> -
>
> Key: SPARK-21658
> URL: https://issues.apache.org/jira/browse/SPARK-21658
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Assignee: Chin Han Yu
>Priority: Minor
>  Labels: Starter
> Fix For: 2.3.0
>
>
> Looks {{na.replace}} missed the default value {{None}}.
> Both docs says they are aliases 
> http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.replace
> http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameNaFunctions.replace
> but the default values looks different, which ends up with:
> {code}
> >>> df = spark.createDataFrame([('Alice', 10, 80.0)])
> >>> df.replace({"Alice": "a"}).first()
> Row(_1=u'a', _2=10, _3=80.0)
> >>> df.na.replace({"Alice": "a"}).first()
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: replace() takes at least 3 arguments (2 given)
> {code}
> To take the advantage of SPARK-19454, sounds we should match them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21658) Adds the default None for value in na.replace in PySpark to match

2018-02-03 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351327#comment-16351327
 ] 

Hyukjin Kwon commented on SPARK-21658:
--

I am going to revert this alias matching anyway considering we are close to 
2.3.0 and the explicit objection, for now.

> Adds the default None for value in na.replace in PySpark to match
> -
>
> Key: SPARK-21658
> URL: https://issues.apache.org/jira/browse/SPARK-21658
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Assignee: Chin Han Yu
>Priority: Minor
>  Labels: Starter
> Fix For: 2.3.0
>
>
> Looks {{na.replace}} missed the default value {{None}}.
> Both docs says they are aliases 
> http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.replace
> http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameNaFunctions.replace
> but the default values looks different, which ends up with:
> {code}
> >>> df = spark.createDataFrame([('Alice', 10, 80.0)])
> >>> df.replace({"Alice": "a"}).first()
> Row(_1=u'a', _2=10, _3=80.0)
> >>> df.na.replace({"Alice": "a"}).first()
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: replace() takes at least 3 arguments (2 given)
> {code}
> To take the advantage of SPARK-19454, sounds we should match them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21658) Adds the default None for value in na.replace in PySpark to match

2018-02-03 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351324#comment-16351324
 ] 

Hyukjin Kwon commented on SPARK-21658:
--

I think the actual root cause is because we happen to allow a dictionary for 
{{to_replace}} at the first place. Maybe, we can move the discussion for API 
and implementation details to SPARK-19454 or 
https://github.com/apache/spark/pull/16793.


> Adds the default None for value in na.replace in PySpark to match
> -
>
> Key: SPARK-21658
> URL: https://issues.apache.org/jira/browse/SPARK-21658
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Assignee: Chin Han Yu
>Priority: Minor
>  Labels: Starter
> Fix For: 2.3.0
>
>
> Looks {{na.replace}} missed the default value {{None}}.
> Both docs says they are aliases 
> http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.replace
> http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameNaFunctions.replace
> but the default values looks different, which ends up with:
> {code}
> >>> df = spark.createDataFrame([('Alice', 10, 80.0)])
> >>> df.replace({"Alice": "a"}).first()
> Row(_1=u'a', _2=10, _3=80.0)
> >>> df.na.replace({"Alice": "a"}).first()
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: replace() takes at least 3 arguments (2 given)
> {code}
> To take the advantage of SPARK-19454, sounds we should match them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23305) Test `spark.sql.files.ignoreMissingFiles` for all file-based data sources

2018-02-03 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-23305.
-
   Resolution: Fixed
 Assignee: Dongjoon Hyun
Fix Version/s: 2.3.0

> Test `spark.sql.files.ignoreMissingFiles` for all file-based data sources
> -
>
> Key: SPARK-23305
> URL: https://issues.apache.org/jira/browse/SPARK-23305
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.3.0
>
>
> Like Parquet, all file-based data source handles 
> `spark.sql.files.ignoreMissingFiles` correctly. We had better have a test 
> coverage for feature parity and in order to prevent future accidental 
> regression for all data sources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23311) add FilterFunction test case for test CombineTypedFilters

2018-02-03 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-23311.
-
   Resolution: Fixed
 Assignee: caoxuewen
Fix Version/s: 2.3.0

> add FilterFunction test case for test CombineTypedFilters
> -
>
> Key: SPARK-23311
> URL: https://issues.apache.org/jira/browse/SPARK-23311
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: caoxuewen
>Assignee: caoxuewen
>Priority: Minor
> Fix For: 2.3.0
>
>
> In the current test case for CombineTypedFilters, we lack the test of 
> FilterFunction, so let's add it. In addition, in 
> TypedFilterOptimizationSuite's existing test cases, Let's extract a common 
> LocalRelation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org