[jira] [Closed] (SPARK-10174) refactor out project, filter, ordering generator from SparkPlan

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li closed SPARK-10174. --- Resolution: Won't Fix Close it since the PR has been closed. > refactor out project, filter, ordering

[jira] [Closed] (SPARK-10154) remove the no-longer-necessary CatalystScan

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li closed SPARK-10154. --- Resolution: Won't Fix Keep it based on the PR discussion > remove the no-longer-necessary CatalystScan >

[jira] [Closed] (SPARK-8436) Inconsistent behavior when converting a Timestamp column to Integer/Long and then convert back to Timestamp

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li closed SPARK-8436. -- Resolution: Won't Fix > Inconsistent behavior when converting a Timestamp column to Integer/Long and > then

[jira] [Commented] (SPARK-8436) Inconsistent behavior when converting a Timestamp column to Integer/Long and then convert back to Timestamp

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557282#comment-15557282 ] Xiao Li commented on SPARK-8436: After reading the PR description, this is not valid now. > Inconsistent

[jira] [Commented] (SPARK-17825) Expose log likelihood of EM algorithm in mllib

2016-10-07 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557281#comment-15557281 ] Yanbo Liang commented on SPARK-17825: - Sure. You can definitely contribute on this issue after my PR.

[jira] [Closed] (SPARK-14229) PySpark DataFrame.rdd's can't be saved to an arbitrary Hadoop OutputFormat

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-14229. --- Resolution: Won't Fix I don't think this is really a bug - if you want to save from dataframes there is the

[jira] [Commented] (SPARK-13585) addPyFile behavior change between 1.6 and before

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557248#comment-15557248 ] holdenk commented on SPARK-13585: - What is the use case for overwriting the old pyFile? The current scala

[jira] [Commented] (SPARK-13606) Error from python worker: /usr/local/bin/python2.7: undefined symbol: _PyCodec_LookupTextEncoding

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557244#comment-15557244 ] holdenk commented on SPARK-13606: - Are you still experiencing this? > Error from python worker:

[jira] [Commented] (SPARK-13534) Implement Apache Arrow serializer for Spark DataFrame for use in DataFrame.toPandas

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557242#comment-15557242 ] holdenk commented on SPARK-13534: - For people following along arrow is in the middle of voting on its

[jira] [Closed] (SPARK-13368) PySpark JavaModel fails to extract params from Spark side automatically

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-13368. --- Resolution: Fixed > PySpark JavaModel fails to extract params from Spark side automatically >

[jira] [Commented] (SPARK-13368) PySpark JavaModel fails to extract params from Spark side automatically

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557239#comment-15557239 ] holdenk commented on SPARK-13368: - It seems that we don't have this in the example anymore, although

[jira] [Updated] (SPARK-9965) Scala, Python SQLContext input methods' deprecation statuses do not match

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-9965: --- Component/s: (was: SQL) > Scala, Python SQLContext input methods' deprecation statuses do not match >

[jira] [Commented] (SPARK-9938) Constant folding in binaryComparison

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557235#comment-15557235 ] Xiao Li commented on SPARK-9938: After reading the PR discussion, I think we can first close it now. If

[jira] [Closed] (SPARK-9938) Constant folding in binaryComparison

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li closed SPARK-9938. -- Resolution: Won't Fix > Constant folding in binaryComparison > > >

[jira] [Commented] (SPARK-13303) Spark fails with pandas import error when pandas is not explicitly imported by user

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557230#comment-15557230 ] holdenk commented on SPARK-13303: - What about if we added a requirements file? We have one for our dev

[jira] [Commented] (SPARK-11722) Rdds could be different between orginal one and save-out-then-read-in one

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557226#comment-15557226 ] holdenk commented on SPARK-11722: - Is this still an issue you are experiencing and if so do you have

[jira] [Commented] (SPARK-12776) Implement Python API for Datasets

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557224#comment-15557224 ] holdenk commented on SPARK-12776: - Just re-opening discussion here - the migration to datasets was given

[jira] [Commented] (SPARK-9842) Push down Spark SQL UDF to datasource UDF

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557222#comment-15557222 ] Xiao Li commented on SPARK-9842: cc [~tsuresh] > Push down Spark SQL UDF to datasource UDF >

[jira] [Commented] (SPARK-12100) bug in spark/python/pyspark/rdd.py portable_hash()

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557219#comment-15557219 ] holdenk commented on SPARK-12100: - Just noting related progress in

[jira] [Commented] (SPARK-11874) DistributedCache for PySpark

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557217#comment-15557217 ] holdenk commented on SPARK-11874: - I think this is not intended to be supported, although I'm not super

[jira] [Closed] (SPARK-12774) DataFrame.mapPartitions apply function operates on Pandas DataFrame instead of a generator or rows

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-12774. --- Resolution: Won't Fix In some ways yes avoiding unecessary iteration can be good, but allowing Spark to

[jira] [Commented] (SPARK-9732) remove the unsafe -> safe conversion

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557203#comment-15557203 ] Xiao Li commented on SPARK-9732: Should we close this? > remove the unsafe -> safe conversion >

[jira] [Commented] (SPARK-11571) Twitter Api for PySpark

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557205#comment-15557205 ] holdenk commented on SPARK-11571: - Is there anything you are looking to do with this API? It can

[jira] [Commented] (SPARK-3600) RDD[Double] doesn't use primitive arrays for caching

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557195#comment-15557195 ] holdenk commented on SPARK-3600: Is this something we still want to work on or does `Datasets` make this

[jira] [Commented] (SPARK-3513) Provide a utility for running a function once on each executor

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557193#comment-15557193 ] holdenk commented on SPARK-3513: This seems closely related to SPARK-650 and SPARK-636 as well. Is this

[jira] [Closed] (SPARK-9764) Spark SQL uses table metadata inconsistently

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li closed SPARK-9764. -- Resolution: Fixed > Spark SQL uses table metadata inconsistently >

[jira] [Commented] (SPARK-9764) Spark SQL uses table metadata inconsistently

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557188#comment-15557188 ] Xiao Li commented on SPARK-9764: This should be resolved in the latest branch. Please check it. Thanks! >

[jira] [Comment Edited] (SPARK-9764) Spark SQL uses table metadata inconsistently

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557188#comment-15557188 ] Xiao Li edited comment on SPARK-9764 at 10/8/16 4:38 AM: - This should be resolved

[jira] [Commented] (SPARK-3348) Support user-defined SparkListeners properly

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557186#comment-15557186 ] holdenk commented on SPARK-3348: Is there still interest in seeing this happen? Should we ping the dev@

[jira] [Commented] (SPARK-9342) Spark SQL views don't work

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557184#comment-15557184 ] Xiao Li commented on SPARK-9342: This should be resolved since Spark 2.0. Please check it. If you still

[jira] [Resolved] (SPARK-9342) Spark SQL views don't work

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-9342. Resolution: Fixed > Spark SQL views don't work > -- > > Key:

[jira] [Closed] (SPARK-3312) Add a groupByKey which returns a special GroupBy object like in pandas

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-3312. -- Resolution: Won't Fix > Add a groupByKey which returns a special GroupBy object like in pandas >

[jira] [Commented] (SPARK-17825) Expose log likelihood of EM algorithm in mllib

2016-10-07 Thread Lei Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557181#comment-15557181 ] Lei Wang commented on SPARK-17825: -- That's good. May I take part in this job? By the way, are you

[jira] [Commented] (SPARK-3312) Add a groupByKey which returns a special GroupBy object like in pandas

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557182#comment-15557182 ] holdenk commented on SPARK-3312: I'm going to go ahead and close this, now that `Datasets` are here they

[jira] [Commented] (SPARK-3132) Avoid serialization for Array[Byte] in TorrentBroadcast

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557178#comment-15557178 ] holdenk commented on SPARK-3132: Is there any progress on this or would it be ok for me to take a look

[jira] [Closed] (SPARK-8957) Backport Hive 1.X support to Branch 1.4

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li closed SPARK-8957. -- Resolution: Won't Fix Thanks. Close it now > Backport Hive 1.X support to Branch 1.4 >

[jira] [Commented] (SPARK-2722) Mechanism for escaping spark configs is not consistent

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557158#comment-15557158 ] holdenk commented on SPARK-2722: I think at this point trying to change the escaping of the different

[jira] [Closed] (SPARK-1792) Missing Spark-Shell Configure Options

2016-10-07 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin closed SPARK-1792. -- Resolution: Fixed > Missing Spark-Shell Configure Options > - > >

[jira] [Commented] (SPARK-2032) Add an RDD.samplePartitions method for partition-level sampling

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557153#comment-15557153 ] holdenk commented on SPARK-2032: I'm assuming since there hasn't been any activity for awhile [~prashant_]

[jira] [Commented] (SPARK-1865) Improve behavior of cleanup of disk state

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557149#comment-15557149 ] holdenk commented on SPARK-1865: So ALS specifically has a work around for this with cleaning up shuffle

[jira] [Commented] (SPARK-1792) Missing Spark-Shell Configure Options

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557146#comment-15557146 ] holdenk commented on SPARK-1792: It feels like we've already got a pretty good mechanism for handling this

[jira] [Closed] (SPARK-9309) Support DecimalType and TimestampType in UnsafeRowConverter

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li closed SPARK-9309. -- Resolution: Won't Fix Based on the above discussion, it sounds this JIRA is not needed. Close it now > Support

[jira] [Updated] (SPARK-17825) Expose log likelihood of EM algorithm in mllib

2016-10-07 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-17825: Component/s: (was: MLlib) ML > Expose log likelihood of EM algorithm in mllib

[jira] [Commented] (SPARK-17825) Expose log likelihood of EM algorithm in mllib

2016-10-07 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557137#comment-15557137 ] Yanbo Liang commented on SPARK-17825: - [~is03wlei] This task depends on copying the GaussianMixture

[jira] [Commented] (SPARK-8957) Backport Hive 1.X support to Branch 1.4

2016-10-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557139#comment-15557139 ] Michael Armbrust commented on SPARK-8957: - Yeah, close it. > Backport Hive 1.X support to Branch

[jira] [Commented] (SPARK-1762) Add functionality to pin RDDs in cache

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557133#comment-15557133 ] holdenk commented on SPARK-1762: Is this something we are still interested in? I could see it becoming

[jira] [Updated] (SPARK-6802) User Defined Aggregate Function Refactoring

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-6802: --- Component/s: (was: SQL) > User Defined Aggregate Function Refactoring >

[jira] [Updated] (SPARK-9189) Takes locality and the sum of partition length into account when partition is instance of HadoopPartition in operator coalesce

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-9189: --- Component/s: (was: SQL) Spark Core > Takes locality and the sum of partition length into

[jira] [Closed] (SPARK-9194) fix case-insensitive bug for aggregation expression which is not PartialAggregate

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li closed SPARK-9194. -- Resolution: Won't Fix > fix case-insensitive bug for aggregation expression which is not > PartialAggregate >

[jira] [Resolved] (SPARK-8624) DataFrameReader doesn't respect MERGE_SCHEMA setting for Parquet

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-8624. Resolution: Won't Fix > DataFrameReader doesn't respect MERGE_SCHEMA setting for Parquet >

[jira] [Commented] (SPARK-8957) Backport Hive 1.X support to Branch 1.4

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557099#comment-15557099 ] Xiao Li commented on SPARK-8957: Is this still needed? Maybe we should close it? > Backport Hive 1.X

[jira] [Commented] (SPARK-10161) Support Pyspark shell over Mesos Cluster Mode

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557069#comment-15557069 ] holdenk commented on SPARK-10161: - That being said - I'm not sure I see the value of this? > Support

[jira] [Commented] (SPARK-10161) Support Pyspark shell over Mesos Cluster Mode

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557068#comment-15557068 ] holdenk commented on SPARK-10161: - I think this is an issue accross cluster modes, maybe using IJupyter

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-10-07 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557034#comment-15557034 ] Vincent commented on SPARK-17219: - [~josephkb] [~srowen] [~timhunter] let me know what I can do to help

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-10-07 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557021#comment-15557021 ] Vincent commented on SPARK-17219: - in this PR(https://github.com/apache/spark/pull/14858) NaN values are

[jira] [Updated] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-9487: --- Labels: starter (was: ) > Use the same num. worker threads in Scala/Python unit tests >

[jira] [Commented] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557018#comment-15557018 ] holdenk commented on SPARK-9487: This will maybe break some tests in the process but it would probably be

[jira] [Commented] (SPARK-17782) Kafka 010 test is flaky

2016-10-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557013#comment-15557013 ] Apache Spark commented on SPARK-17782: -- User 'koeninger' has created a pull request for this issue:

[jira] [Closed] (SPARK-8760) allow moving and symlinking binaries

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-8760. -- Resolution: Fixed This is a "partially fixed" but I think fixed is a close enough description. We don't use

[jira] [Closed] (SPARK-8757) Check missing and add user guide for MLlib Python API

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-8757. -- Resolution: Fixed All sub issues fixed, and well past 1.5 release. > Check missing and add user guide for

[jira] [Commented] (SPARK-8842) Spark SQL - Insert into table Issue

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556986#comment-15556986 ] Xiao Li commented on SPARK-8842: Could you retry it in the latest master branch? Thanks! > Spark SQL -

[jira] [Commented] (SPARK-11272) Support importing and exporting event logs from HistoryServer web portal

2016-10-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556866#comment-15556866 ] Apache Spark commented on SPARK-11272: -- User 'ajbozarth' has created a pull request for this issue:

[jira] [Commented] (SPARK-14503) spark.ml API for FPGrowth

2016-10-07 Thread yuhao yang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556862#comment-15556862 ] yuhao yang commented on SPARK-14503: Yes, let me just send what I got. > spark.ml API for FPGrowth >

[jira] [Assigned] (SPARK-17819) Specified database in JDBC URL is ignored when connecting to thriftserver

2016-10-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17819: Assignee: (was: Apache Spark) > Specified database in JDBC URL is ignored when

[jira] [Commented] (SPARK-17819) Specified database in JDBC URL is ignored when connecting to thriftserver

2016-10-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556857#comment-15556857 ] Apache Spark commented on SPARK-17819: -- User 'dongjoon-hyun' has created a pull request for this

[jira] [Assigned] (SPARK-17819) Specified database in JDBC URL is ignored when connecting to thriftserver

2016-10-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17819: Assignee: Apache Spark > Specified database in JDBC URL is ignored when connecting to

[jira] [Closed] (SPARK-8719) Adding Python support for 1-sample, 2-sided Kolmogorov Smirnov Test

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-8719. -- Resolution: Duplicate > Adding Python support for 1-sample, 2-sided Kolmogorov Smirnov Test >

[jira] [Updated] (SPARK-8605) Exclude files in StreamingContext. textFileStream(directory)

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-8605: --- Component/s: (was: PySpark) Streaming > Exclude files in StreamingContext.

[jira] [Commented] (SPARK-8605) Exclude files in StreamingContext. textFileStream(directory)

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556785#comment-15556785 ] holdenk commented on SPARK-8605: This is semi-documented (namely only atomic moves are supported), but

[jira] [Commented] (SPARK-7177) Create standard way to wrap Spark CLI scripts for external projects

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556780#comment-15556780 ] holdenk commented on SPARK-7177: I've run into similar challenges when working on Sparkling Pandas. >

[jira] [Commented] (SPARK-7941) Cache Cleanup Failure when job is killed by Spark

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556775#comment-15556775 ] holdenk commented on SPARK-7941: Are you still experiencing this issue [~cqnguyen] or would it be ok for

[jira] [Updated] (SPARK-8780) Move Python doctest code example from models to algorithms

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-8780: --- Labels: starter (was: ) > Move Python doctest code example from models to algorithms >

[jira] [Commented] (SPARK-17647) SQL LIKE does not handle backslashes correctly

2016-10-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556767#comment-15556767 ] Apache Spark commented on SPARK-17647: -- User 'jodersky' has created a pull request for this issue:

[jira] [Assigned] (SPARK-17647) SQL LIKE does not handle backslashes correctly

2016-10-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17647: Assignee: Apache Spark > SQL LIKE does not handle backslashes correctly >

[jira] [Assigned] (SPARK-17647) SQL LIKE does not handle backslashes correctly

2016-10-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17647: Assignee: (was: Apache Spark) > SQL LIKE does not handle backslashes correctly >

[jira] [Commented] (SPARK-8780) Move Python doctest code example from models to algorithms

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556763#comment-15556763 ] holdenk commented on SPARK-8780: Is this something we still want to do? This could be a great starter

[jira] [Commented] (SPARK-6831) Document how to use external data sources

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556758#comment-15556758 ] holdenk commented on SPARK-6831: Is this something we are planning to do at all? It doesn't seem to have

[jira] [Closed] (SPARK-6780) Add saveAsTextFileByKey method for PySpark

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-6780. -- Resolution: Won't Fix Since SPARK-3533 is WON'T FIX this one should be to. > Add saveAsTextFileByKey method

[jira] [Closed] (SPARK-7613) Serialization fails in pyspark for lambdas referencing class data members

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-7613. -- Resolution: Won't Fix I believe this is expected behaviour and the current best practice is simply to make a

[jira] [Commented] (SPARK-7638) Python API for pmml.export

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556732#comment-15556732 ] holdenk commented on SPARK-7638: Do we still want to do this or focus on adding PMML export on ML given

[jira] [Commented] (SPARK-6174) Improve doc: Python ALS, MatrixFactorizationModel

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556720#comment-15556720 ] holdenk commented on SPARK-6174: I think Bryan did a good job of this I'd be in favour of closing the

[jira] [Commented] (SPARK-5981) pyspark ML models should support predict/transform on vector within map

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556714#comment-15556714 ] holdenk commented on SPARK-5981: I'm not sure porting the models to Python sounds like a good idea, giving

[jira] [Resolved] (SPARK-4851) "Uninitialized staticmethod object" error in PySpark

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-4851. Resolution: Fixed The provided repro now runs (although we need to provide it with the correct number of

[jira] [Commented] (SPARK-1425) PySpark can crash Executors if worker.py fails while serializing data

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556680#comment-15556680 ] holdenk commented on SPARK-1425: Is this still an issue or do we have a repro case for it? The current

[jira] [Closed] (SPARK-5160) Python module in jars

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-5160. -- Resolution: Fixed This is now supported. > Python module in jars > - > >

[jira] [Assigned] (SPARK-17834) Fetch the earliest offsets manually in KafkaSource instead of counting on KafkaConsumer

2016-10-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17834: Assignee: Apache Spark (was: Shixiong Zhu) > Fetch the earliest offsets manually in

[jira] [Assigned] (SPARK-17834) Fetch the earliest offsets manually in KafkaSource instead of counting on KafkaConsumer

2016-10-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17834: Assignee: Shixiong Zhu (was: Apache Spark) > Fetch the earliest offsets manually in

[jira] [Commented] (SPARK-17834) Fetch the earliest offsets manually in KafkaSource instead of counting on KafkaConsumer

2016-10-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556663#comment-15556663 ] Apache Spark commented on SPARK-17834: -- User 'zsxwing' has created a pull request for this issue:

[jira] [Updated] (SPARK-17834) Fetch the earliest offsets manually in KafkaSource instead of counting on KafkaConsumer

2016-10-07 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-17834: - Issue Type: Sub-task (was: Bug) Parent: SPARK-15406 > Fetch the earliest offsets

[jira] [Commented] (SPARK-4488) Add control over map-side aggregation

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556650#comment-15556650 ] holdenk commented on SPARK-4488: So while the associated PR is closed, we ended up adding the option to

[jira] [Updated] (SPARK-17834) Fetch the earliest offsets manually in KafkaSource instead of counting on KafkaConsumer

2016-10-07 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-17834: - Summary: Fetch the earliest offsets manually in KafkaSource instead of counting on KafkaConsumer

[jira] [Created] (SPARK-17834) Fetch the initial offsets manually in KafkaSource instead of counting on KafkaConsumer

2016-10-07 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-17834: Summary: Fetch the initial offsets manually in KafkaSource instead of counting on KafkaConsumer Key: SPARK-17834 URL: https://issues.apache.org/jira/browse/SPARK-17834

[jira] [Commented] (SPARK-17626) TPC-DS performance improvements using star-schema heuristics

2016-10-07 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556645#comment-15556645 ] Reynold Xin commented on SPARK-17626: - Thanks - this makes sense (especially the bushy tree part).

[jira] [Resolved] (SPARK-2999) Compress all the serialized data

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-2999. Resolution: Fixed Fixed in b5c51c8df480f1a82a82e4d597d8eea631bffb4e > Compress all the serialized data >

[jira] [Resolved] (SPARK-8791) Make a better hashcode for InternalRow

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-8791. Resolution: Fixed > Make a better hashcode for InternalRow > -- > >

[jira] [Commented] (SPARK-8791) Make a better hashcode for InternalRow

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556603#comment-15556603 ] Xiao Li commented on SPARK-8791: This sounds have been resolved in the later version. Let me close it now.

[jira] [Commented] (SPARK-2868) Support named accumulators in Python

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556582#comment-15556582 ] holdenk commented on SPARK-2868: Is this something we are still interested in pursuing (cc [~rxin] who did

[jira] [Resolved] (SPARK-2654) Leveled logging in PySpark

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-2654. Resolution: Fixed This has been fixed in SPARK-3444 / ae98eec730125c1153dcac9ea941959cc79e4f42 > Leveled

[jira] [Commented] (SPARK-8527) StructType's Factory method does not work in java code

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556567#comment-15556567 ] Xiao Li commented on SPARK-8527: This should have been resolved. Could you retry it in the master branch.

[jira] [Resolved] (SPARK-8527) StructType's Factory method does not work in java code

2016-10-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-8527. Resolution: Fixed > StructType's Factory method does not work in java code >

  1   2   3   >