[GitHub] spark pull request #16854: [WIP][SPARK-15463][SQL] Add an API to load DataFr...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16854#discussion_r100229312 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -361,6 +362,41 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { } /** + * Loads an `Dataset[String]` storing CSV rows and returns the result as a `DataFrame`. + * + * Unless the schema is specified using `schema` function, this function goes through the + * input once to determine the input schema. + * + * @param csvDataset input Dataset with one CSV row per record + * @since 2.2.0 + */ + def csv(csvDataset: Dataset[String]): DataFrame = { +val parsedOptions: CSVOptions = new CSVOptions(extraOptions.toMap) --- End diff -- Just to help review, there is a similar code path in https://github.com/apache/spark/blob/3d314d08c9420e74b4bb687603cdd11394eccab5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala#L105-L125 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16854: [SPARK-15463][SQL] Add an API to load DataFrame from Dat...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16854 Cc @cloud-fan, do you mind if I ask you think it is worth adding this API? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #7963: [SPARK-6227] [MLlib] [PySpark] Implement PySpark wrappers...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/7963 Ping @MechCoder, are you able to proceed this PR and address the comments above? If not it might be good to close this for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8374: [SPARK-10101] [SQL] Add maxlength to JDBC field metadata ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/8374 ping @rama-mullapudi --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8384: [SPARK-8510] [CORE] [PYSPARK] NumPy matrices as values in...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/8384 @paberline, can we then close this for now? I guess it is a soft-yes for closing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8785: [Spark-10625] [SQL] Spark SQL JDBC read/write is unable t...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/8785 ping @tribbloid. Are you able to proceed the review comments? If not, it'd be better closed for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10262: [SPARK-12270][SQL]remove empty space after getString fro...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/10262 Hi @huaxingao, would this be better closed for now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11192: [SPARK-13257] [Improvement] Scala naive Bayes example: c...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/11192 Hi @movelikeriver, are you able to proceed this further? If not, maybe it'd be better closed for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11211: [SPARK-13330][PYSPARK] PYTHONHASHSEED is not propgated t...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/11211 (ping @zjffdu) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11374: [SPARK-12042] Python API for mllib.stat.test.StreamingTe...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/11374 Hi @yinxusen, are you able to proceed this further? If not, it seems it might be better closed for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11692: [SPARK-13852][YARN]handle the InterruptedException cause...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/11692 Hi @vanzin, would this be then a soft-suggestion for closing this if there is no objection for about a week? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11887: [SPARK-13041][Mesos]add driver sandbox uri to the dispat...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/11887 Hi @skonto, are you able to proceed this PR further? if not, it might be better closed for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12243: [SPARK-14467][SQL] Interleave CPU and IO better in FileS...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/12243 Hi @nongli, I just happened to look at this PR. It seems it has been inactive for few months without answering to review comments. Would this be better closed for now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12335: [SPARK-11321] [SQL] Python non null udfs
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/12335 Hi @kevincox, I happened to look at this PR. It seems it is inactive for few months while there are some review comments. Would this be better closed if you are currently not able to proceed this further for now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12337: [SPARK-15566] Expose null checking function to Python la...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/12337 @kevincox, It seems inactive for few months. Should this be maybe closed for now if you are currently not able to proceed this further? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12398: [SPARK-5929][PYSPARK] Context addPyPackage and addPyRequ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/12398 ping @buckhx --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12620: [SPARK-14859][PYSPARK] Make Lambda Serializer Configurab...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/12620 @njwhite It seems inactive for few months. Would this be better to close this for now if you are currently not able to proceed this further? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12675: [SPARK-14894][PySpark] Add result summary api to Gaussia...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/12675 Hi @GayathriMurali, it seems inactive for the review comments for few months. Should this be better closed for now if you are not able to proceed this further? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12697: [SPARK-14754][SPARK CORE] Metrics as logs are not coming...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/12697 gentle ping @mihir6692 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12800: [SPARK-15024] NoClassDefFoundError in spark-examples due...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/12800 Hi @atokhy, it seems inactive without an answer to the comment above. Should this be better closed if you are not able to proceed this further? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12933: [Spark-15155][Mesos] Optionally ignore default role reso...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/12933 Hi @hellertime, this PR seems inactive for few months after the last review comments above. Would this be better closed if you are not currently able to work on this further maybe? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13467: [SPARK-15642][SQL][WIP] Metadata gets lost when selectin...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13467 Hi @zommerfelds, do you mind if I ask whether you still working on this? It seems inactive for more than a half year. Maybe, it'd be better closed for now if you are currently not able to work on this further. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13715: [SPARK-15992] [MESOS] Refactor MesosCoarseGrainedSchedul...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13715 (ping @drcrallen) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13891 @hqzizania If you check the log, there are some guides for how to. Should we maybe rebase this and check the logs? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14266: [SPARK-16526][SQL] Benchmarking Performance for Fast Has...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14266 (gentle ping @ooq) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14321: [SPARK-8971][ML] Add stratified sampling to ML CrossVali...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14321 I just happened to look at this PR. Is this still WIP or waiting more review comments? If it is simply that the author is not currently able to proceed this further, then, maybe it'd be better to close this for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14461 (gentle ping @nblintao) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14517: [SPARK-16931][PYTHON] PySpark APIS for bucketBy a...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/14517#discussion_r100310447 --- Diff: python/pyspark/sql/readwriter.py --- @@ -747,16 +800,25 @@ def _test(): except py4j.protocol.Py4JError: spark = SparkSession(sc) +seed = int(time() * 1000) --- End diff -- @GregBowyer ping. Let me propose to close this after a week. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14579: [SPARK-16921][PYSPARK] RDD/DataFrame persist()/cache() s...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14579 Is there any reason why it is not merged yet? I personally like this too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14601: [SPARK-13979][Core] Killed executor is re spawned withou...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14601 (gentle ping @agsachin) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14936: [SPARK-7877][MESOS] Allow configuration of framework tim...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14936 (@gentle ping philipphoffmann) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15159: [SPARK-17605][SPARK_SUBMIT] Add option spark.usePython a...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15159 (gentle ping @zjffdu) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15209: replace function type with function isinstance
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15209 @frankfqchen Could you follow the comment above? If you are not able to proceed further, I think it might be better closed for now. Actually, I am not too sure if it is worth sweeping them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15267: [SPARK-17667] [YARN][WIP]Make locking fine grained in Ya...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15267 Hi @ashwinshankar77, if you are not currently able to work on this further, maybe it should be closed for now. It seems inactive for few months. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15861: [SPARK-18294][CORE] Implement commit protocol to support...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15861 (gentle ping @jiangxb1987) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15871: [SPARK-17116][Pyspark] Allow parameters to be {string,va...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15871 Hi @aditya1702, do you mind if I ask whether you are still working on this? Maybe it should be closed for now if you are currently not able to work on this further. It seems inactive for few months. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16199: [SPARK-18772][SQL] NaN/Infinite float parsing in JSON is...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16199 What do you think about my suggestion @NathanHowell ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16278: [SPARK-18779][STREAMING][KAFKA] Messages being received ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16278 (@pnakhe gentle ping, I am curious too) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16324: [SPARK-18910][SQL]Resolve faile to use UDF that jar file...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16324 (gentle ping @henh062326) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16319: [WiP][SPARK-18699] SQL - parsing CSV should return null ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16319 Hi @kubatyszko, are you still working on this? If you are currently unable to proceed further, maybe it should be closed for now. It seems inactive for few months. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16083: [SPARK-18097][SQL] Add exception catch to handle corrupt...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16083 Hi @jayadevanmurali, are you still working on this? It seems inactive for few months. Maybe, it might better be closed for now if you are currently not able to proceed further. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100332529 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -298,22 +312,22 @@ class JacksonParser( // Here, we pass empty `PartialFunction` so that this case can be // handled as a failed conversion. It will throw an exception as // long as the value is not null. -parseJsonToken(parser, dataType)(PartialFunction.empty[JsonToken, Any]) +parseJsonToken[AnyRef](parser, dataType)(PartialFunction.empty[JsonToken, AnyRef]) } /** * This method skips `FIELD_NAME`s at the beginning, and handles nulls ahead before trying * to parse the JSON token using given function `f`. If the `f` failed to parse and convert the * token, call `failedConversion` to handle the token. */ - private def parseJsonToken( + private def parseJsonToken[R >: Null]( --- End diff -- Yes, I said +1 because it explicitly expresses it should be nullable and I _assumed_ (because I did not check the byte codes by myself and I might be wrong) that it gives a hint to compiler because `Null` is `null`able (I remember I googled and played some references for whole several days before when I was investigating another null-related PR). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16777: [SPARK-19435][SQL] Type coercion between ArrayTypes
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16777 cc @cloud-fan, WDYT? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16882: [SPARK-19544][SQL] Improve error message when som...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/16882 [SPARK-19544][SQL] Improve error message when some column types are compatible and others are not in set/union operations ## What changes were proposed in this pull request? This PR proposes to fix the error message when some data types are compatible and others are not in set/union operation. ```scala Seq((1,("a", 1))).toDF.union(Seq((1L,("a", "b"))).toDF) ``` **Before** ``` Union can only be performed on tables with the compatible column types. LongType <> IntegerType at the first column of the second table;; ``` **After** ``` Union can only be performed on tables with the compatible column types. StructType(StructField(_1,StringType,true), StructField(_2,StringType,true)) <> StructType(StructField(_1,StringType,true), StructField(_2,IntegerType,false)) at the second column of the second table;; ``` ## How was this patch tested? Unit tests in `AnalysisErrorSuite` and manual tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-19544 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16882.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16882 commit 07e698415bb6a48e60cd2359cd9d412c2f61e48b Author: hyukjinkwon Date: 2017-02-10T05:01:16Z Improve error message when some column types are compatible and others are not in set/union operations --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16882: [SPARK-19544][SQL] Improve error message when some colum...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16882 cc @hvanhovell could you please take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16881: [SPARK-19543] from_json fails when the input row ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16881#discussion_r100477537 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -496,7 +496,7 @@ case class JsonToStruct(schema: StructType, options: Map[String, String], child: override def dataType: DataType = schema override def nullSafeEval(json: Any): Any = { -try parser.parse(json.toString).head catch { +try parser.parse(json.toString).headOption.orNull catch { --- End diff -- (Not for this PR but maybe loosely related I guess) I was thinking it is a bit odd that we support to only read the single row when it is a json array. It seems, for example, ```scala import org.apache.spark.sql.functions._ import org.apache.spark.sql.types._ val schema = StructType(StructField("a", IntegerType) :: Nil) Seq(("""[{"a": 1}, {"a": 2}]""")).toDF("struct").select(from_json(col("struct"), schema)).show() ++ |jsontostruct(struct)| ++ | [1]| ++ ``` I think maybe we should not support this in that function or it should work like a generator expression. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16882: [SPARK-19544][SQL] Improve error message when some colum...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16882 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16881: [SPARK-19543] from_json fails when the input row ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16881#discussion_r100521142 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -496,7 +496,7 @@ case class JsonToStruct(schema: StructType, options: Map[String, String], child: override def dataType: DataType = schema override def nullSafeEval(json: Any): Any = { -try parser.parse(json.toString).head catch { +try parser.parse(json.toString).headOption.orNull catch { --- End diff -- Thank you for confirming. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16882: [SPARK-19544][SQL] Improve error message when some colum...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16882 Oh, sure. Let me give a shot. Thank you for your quick review! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16777: [SPARK-19435][SQL] Type coercion between ArrayTypes
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16777 Let me check other DBMSs and back. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16890: when colum is use alias ,the order by result is wrong
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16890 Could you please this and ask this to Spark user mailing list? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16278: [SPARK-18779][STREAMING][KAFKA] Messages being received ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16278 Aha, thanks for the details, then is this PR/JIRA closable maybe? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13467: [SPARK-15642][SQL][WIP] Metadata gets lost when selectin...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13467 Aha, then to be strict, it is not WIP but just waiting the feedback. How about closing this and pinging related guys who touched this code path lately in the JIRA? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11692: [SPARK-13852][YARN]handle the InterruptedException cause...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/11692 Let me try to propose to close this after a week if the author seems not active on this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8785: [Spark-10625] [SQL] Spark SQL JDBC read/write is unable t...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/8785 I am not supposed to decide what to merge but I left the command as I just found this seems not active to the review comments and I assumed that this PR is currently abandoned which the author happened to be not able to proceed further for now. I'd rebase/address the review comments and keep pinging the related guys here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16083: [SPARK-18097][SQL] Add exception catch to handle corrupt...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16083 Actually, I don't know (or heard of) how the schema is corrupt. I guess any committer should verify this before merging it. How about asking this to dev/user mailing list if you are not sure? If it is expected to be open without the reproducible steps for a long time (like few weeks or months), I guess we maybe should better close this for now and then open it latter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16777: [SPARK-19435][SQL] Type coercion between ArrayTypes
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16777 **Postgres** ``` postgres=# SELECT greatest(array[1], array[0.1]); greatest -- {1} (1 row) postgres=# SELECT least(array[1], array[0.1]); least --- {0.1} (1 row) ``` ``` postgres=# SELECT * FROM (values (array[1]), (array[0.1])) as foo; column1 - {1} {0.1} (2 rows) ``` ``` postgres=# SELECT array[1] UNION SELECT array[0.1]; array --- {0.1} {1} (2 rows) ``` ``` postgres=# SELECT CASE WHEN TRUE THEN array[0.1] ELSE array[1] END; array --- {0.1} (1 row) ``` (sorry, I could not find a proper way to test this with `IF`. So, I used `CASE`/`WHEN` in postgres). **Hive** - not supporting this type coercion. ``` SELECT least(array(1), array(1D)); FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments '1': least only takes primitive types, got array ``` least/greatest: seems only supporting primitive types ``` SELECT inline(array(struct(array(0)), struct(array(1D; FAILED: SemanticException [Error 10016]: Line 1:38 Argument type mismatch '1D': Argument type "struct>" is different from preceding arguments. Previous type was "struct>" ``` ``` SELECT array(1) UNION SELECT array(1D); FAILED: SemanticException Schema of both sides of union should match: Column _c0 is of type array on first table and type array on second table. Cannot tell the position of null AST. ``` ``` SELECT IF(1=1, array(1), array(1D)); FAILED: SemanticException [Error 10016]: Line 1:25 Argument type mismatch '1D': The second and the third arguments of function IF should have the same type, but they are different: "array" and "array" ``` **MySQL** Seems not supporting arrays --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16777: [SPARK-19435][SQL] Type coercion between ArrayTypes
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16777 @cloud-fan, To cut this short, it seems Postgres supports this whereas Hive does not. It seems we now support implicit cast between`ArrayType`s via [SPARK-18624](https://issues.apache.org/jira/browse/SPARK-18624), for example as below: ```scala sql("SELECT percentile_approx(10.0, array('1', '1', '1'), 100)").show() ``` Wouldn't it be more reasonable to allow this case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16854: [SPARK-15463][SQL] Add an API to load DataFrame f...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16854#discussion_r100667530 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -361,6 +362,41 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { } /** + * Loads an `Dataset[String]` storing CSV rows and returns the result as a `DataFrame`. + * + * Unless the schema is specified using `schema` function, this function goes through the + * input once to determine the input schema. + * + * @param csvDataset input Dataset with one CSV row per record + * @since 2.2.0 + */ + def csv(csvDataset: Dataset[String]): DataFrame = { --- End diff -- Sure. Actually, there is a JIRA and closed PR https://github.com/apache/spark/pull/13460 and [SPARK-15615](https://issues.apache.org/jira/browse/SPARK-15615) where I was negative because it can be easily worked around. However, I am fine if we are promoting to use datasets instead of RDDs for some advantages such as [SPARK-18362](https://issues.apache.org/jira/browse/SPARK-18362). cc @pjfanning, could you reopen and proceed your PR if we are all fine? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16611: [SPARK-17967][SPARK-17878][SQL][PYTHON] Support for arra...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16611 @rxin, does that look okay to you? I am worried if > **SQL** - array-like form of integer, decimal, string and boolean sounds okay to you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16882: [SPARK-19544][SQL] Improve error message when som...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16882#discussion_r100667599 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -321,12 +321,12 @@ trait CheckAnalysis extends PredicateHelper { // Check if the data types match. dataTypes(child).zip(ref).zipWithIndex.foreach { case ((dt1, dt2), ci) => // SPARK-18058: we shall not care about the nullability of columns -if (!dt1.sameType(dt2)) { +if (TypeCoercion.findWiderTypeForTwo(dt1.asNullable, dt2.asNullable).isEmpty) { failAnalysis( s""" |${operator.nodeName} can only be performed on tables with the compatible - |column types. $dt1 <> $dt2 at the ${ordinalNumber(ci)} column of - |the ${ordinalNumber(ti + 1)} table + |column types. ${dt1.simpleString} <> ${dt2.simpleString} at the --- End diff -- (I used `simpleString` for consistency with other codes in this) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13467: [SPARK-15642][SQL][WIP] Metadata gets lost when selectin...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13467 Yes, exactly. That's usually I do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100668061 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -101,13 +101,13 @@ object TypeCoercion { case _ => None } - /** Similar to [[findTightestCommonType]], but can promote all the way to StringType. */ - def findTightestCommonTypeToString(left: DataType, right: DataType): Option[DataType] = { -findTightestCommonType(left, right).orElse((left, right) match { - case (StringType, t2: AtomicType) if t2 != BinaryType && t2 != BooleanType => Some(StringType) - case (t1: AtomicType, StringType) if t1 != BinaryType && t1 != BooleanType => Some(StringType) - case _ => None -}) + /** + * Promotes all the way to StringType. + */ + private def stringPromotion: (DataType, DataType) => Option[DataType] = { --- End diff -- Oh, I will fix it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16824: [SPARK-18069][PYTHON] Make PySpark doctests for SQL self...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16824 gentle ping @holdenk (somehow writing `@holdenk` does not show your name ... https://cloud.githubusercontent.com/assets/6477701/22854544/3d5ec930-f0b4-11e6-82c8-195a725caaf4.png";>) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16882: [SPARK-19544][SQL] Improve error message when som...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16882#discussion_r100683242 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -116,7 +116,7 @@ object TypeCoercion { * i.e. the main difference with [[findTightestCommonType]] is that here we allow some * loss of precision when widening decimal and double, and promotion to string. */ - private def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = (t1, t2) match { --- End diff -- Sure! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16882: [SPARK-19544][SQL] Improve error message when som...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16882#discussion_r100683913 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -116,7 +116,7 @@ object TypeCoercion { * i.e. the main difference with [[findTightestCommonType]] is that here we allow some * loss of precision when widening decimal and double, and promotion to string. */ - private def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = (t1, t2) match { --- End diff -- (Added) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16777: [SPARK-19435][SQL] Type coercion between ArrayTypes
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16777 @cloud-fan, I just addressed your comments and test a build with Scala 2.10. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16890: when colum is use alias ,the order by result is wrong
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16890 @muyannian Could you click the "Close pull request" button below? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100690775 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala --- @@ -379,6 +386,67 @@ class TypeCoercionSuite extends PlanTest { widenTest(ArrayType(IntegerType), StructType(Seq()), None) } + test("wider common type for decimal and array") { +def widenTestWithStringPromotion( +t1: DataType, +t2: DataType, +expected: Option[DataType]): Unit = { + checkWidenType(TypeCoercion.findWiderTypeForTwo, t1, t2, expected) +} + +def widenTestWithoutStringPromotion( +t1: DataType, +t2: DataType, +expected: Option[DataType]): Unit = { + checkWidenType(TypeCoercion.findWiderTypeWithoutStringPromotionForTwo, t1, t2, expected) +} + +// Decimal +widenTestWithStringPromotion( + DecimalType(2, 1), DecimalType(3, 2), Some(DecimalType(3, 2))) +widenTestWithStringPromotion( + DecimalType(2, 1), DoubleType, Some(DoubleType)) +widenTestWithStringPromotion( + DecimalType(2, 1), IntegerType, Some(DecimalType(11, 1))) +widenTestWithStringPromotion( + DoubleType, DecimalType(2, 1), Some(DoubleType)) +widenTestWithStringPromotion( + LongType, DecimalType(2, 1), Some(DecimalType(21, 1))) + +// ArrayType +widenTestWithStringPromotion( + ArrayType(ShortType, containsNull = true), + ArrayType(DoubleType, containsNull = false), + Some(ArrayType(DoubleType, containsNull = true))) +widenTestWithStringPromotion( + ArrayType(TimestampType, containsNull = false), + ArrayType(StringType, containsNull = true), + Some(ArrayType(StringType, containsNull = true))) +widenTestWithStringPromotion( + ArrayType(ArrayType(IntegerType), containsNull = false), + ArrayType(ArrayType(LongType), containsNull = false), + Some(ArrayType(ArrayType(LongType), containsNull = false))) + +// Without string promotion +widenTestWithoutStringPromotion(IntegerType, StringType, None) --- End diff -- `LongType` test was removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16777: [SPARK-19435][SQL] Type coercion between ArrayTypes
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16777 Thanks @cloud-fan for your detailed review. I will keep in mind those comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16898: [SPARK-19563][SQL] advoid unnecessary sort in Fil...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16898#discussion_r100692348 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala --- @@ -134,8 +142,26 @@ object FileFormatWriter extends Logging { // prepares the job, any exception thrown from here shouldn't cause abortJob() to be called. committer.setupJob(job) + val bucketIdExpression = bucketSpec.map { spec => +// Use `HashPartitioning.partitionIdExpression` as our bucket id expression, so that we can +// guarantee the data distribution is same between shuffle and bucketed data source, which +// enables us to only shuffle one side when join a bucketed table and a normal one. +HashPartitioning(bucketColumns, spec.numBuckets).partitionIdExpression + } + // We should first sort by partition columns, then bucket id, and finally sorting columns. + val requiredOrdering = (partitionColumns ++ bucketIdExpression ++ sortColumns) +.map(SortOrder(_, Ascending)) + val actualOrdering = queryExecution.executedPlan.outputOrdering + // We can still avoid the sort if the required ordering is [partCol] and the actual ordering + // is [partCol, anotherCol]. + val rdd = if (requiredOrdering == actualOrdering.take(requiredOrdering.length)) { +queryExecution.toRdd + } else { +SortExec(requiredOrdering, global = false, queryExecution.executedPlan).execute() --- End diff -- Oh, I met this case before IIRC. This complains in Scala 2.10. I guess it should be ``` SortExec(requiredOrdering, global = false, child = queryExecution.executedPlan).execute() ``` because it seems the complier gets confused the positional/named arguments. (this is actually invalid syntax in Python). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16898: [SPARK-19563][SQL] advoid unnecessary sort in Fil...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16898#discussion_r100692415 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala --- @@ -134,8 +142,26 @@ object FileFormatWriter extends Logging { // prepares the job, any exception thrown from here shouldn't cause abortJob() to be called. committer.setupJob(job) + val bucketIdExpression = bucketSpec.map { spec => +// Use `HashPartitioning.partitionIdExpression` as our bucket id expression, so that we can +// guarantee the data distribution is same between shuffle and bucketed data source, which +// enables us to only shuffle one side when join a bucketed table and a normal one. +HashPartitioning(bucketColumns, spec.numBuckets).partitionIdExpression + } + // We should first sort by partition columns, then bucket id, and finally sorting columns. + val requiredOrdering = (partitionColumns ++ bucketIdExpression ++ sortColumns) +.map(SortOrder(_, Ascending)) + val actualOrdering = queryExecution.executedPlan.outputOrdering + // We can still avoid the sort if the required ordering is [partCol] and the actual ordering + // is [partCol, anotherCol]. + val rdd = if (requiredOrdering == actualOrdering.take(requiredOrdering.length)) { +queryExecution.toRdd + } else { +SortExec(requiredOrdering, global = false, queryExecution.executedPlan).execute() --- End diff -- Yea, it seems it complains. ``` [error] .../spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala:160: not enough arguments for method apply: (sortOrder: Seq[org.apache.spark.sql.catalyst.expressions.SortOrder], global: Boolean, child: org.apache.spark.sql.execution.SparkPlan, testSpillFrequency: Int)org.apache.spark.sql.execution.SortExec in object SortExec. [error] Unspecified value parameter child. [error] SortExec(requiredOrdering, global = false, queryExecution.executedPlan).execute() [error] ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100716252 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -116,48 +114,66 @@ object TypeCoercion { * i.e. the main difference with [[findTightestCommonType]] is that here we allow some * loss of precision when widening decimal and double, and promotion to string. */ - private def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = (t1, t2) match { -case (t1: DecimalType, t2: DecimalType) => - Some(DecimalPrecision.widerDecimalType(t1, t2)) -case (t: IntegralType, d: DecimalType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (d: DecimalType, t: IntegralType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: FractionalType) => - Some(DoubleType) -case _ => - findTightestCommonTypeToString(t1, t2) + def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = { +findTightestCommonType(t1, t2) --- End diff -- Yes, it is true that the type dispatch order was changed but `findTightestCommonType` does not take care of `DecimalType` therefore the results would be the same. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16777: [SPARK-19435][SQL] Type coercion between ArrayTypes
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16777 Do you mean two PRs for cleaning up the logics here and the support of array type coercion? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100716751 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -116,48 +114,66 @@ object TypeCoercion { * i.e. the main difference with [[findTightestCommonType]] is that here we allow some * loss of precision when widening decimal and double, and promotion to string. */ - private def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = (t1, t2) match { -case (t1: DecimalType, t2: DecimalType) => - Some(DecimalPrecision.widerDecimalType(t1, t2)) -case (t: IntegralType, d: DecimalType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (d: DecimalType, t: IntegralType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: FractionalType) => - Some(DoubleType) -case _ => - findTightestCommonTypeToString(t1, t2) + def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = { +findTightestCommonType(t1, t2) --- End diff -- @cloud-fan refactored this logic recently and I believe he didn't missed this part. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100723132 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -116,48 +114,66 @@ object TypeCoercion { * i.e. the main difference with [[findTightestCommonType]] is that here we allow some * loss of precision when widening decimal and double, and promotion to string. */ - private def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = (t1, t2) match { -case (t1: DecimalType, t2: DecimalType) => - Some(DecimalPrecision.widerDecimalType(t1, t2)) -case (t: IntegralType, d: DecimalType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (d: DecimalType, t: IntegralType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: FractionalType) => - Some(DoubleType) -case _ => - findTightestCommonTypeToString(t1, t2) + def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = { +findTightestCommonType(t1, t2) --- End diff -- Aha, thank you for correcting me. I overlooked but the result should be still the same, shouldn't it? - `DecimalType.isWiderThan` ``` (p1 - s1) >= (p2 - s2) && s1 >= s2 ``` - DecimalPrecision.widerDecimalType ``` max(s1, s2) + max(p1-s1, p2-s2), max(s1, s2) ``` If both are different, we were already applying different type coercion rules between `findWiderTypeWithoutStringPromotion` and `findWiderTypeForTwo`, I guess we should match them with the same given https://github.com/apache/spark/pull/14439 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16777: [SPARK-19435][SQL] Type coercion between ArrayTypes
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16777 I see what you mean. The code paths are now different. Let me try to investigate it and split them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16777: [SPARK-19435][SQL] Type coercion between ArrayTypes
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16777 @gatorsmile, Can we make this merged and then add test cases for them separately? It seems the results are the same. I ran two tests as below: ```scala val integralTypes = IndexedSeq( ByteType, ShortType, IntegerType, LongType) val decimals = (-38 to 38).flatMap { p => (-38 to 38).flatMap(s => allCatch opt DecimalType(p, s)) } assert(decimals.nonEmpty) integralTypes.foreach { it => test(s"$it test") { decimals.foreach { d => // From TypeCoercion.findWiderTypeForTwo val maybeType1 = (d, it) match { case (d: DecimalType, t: IntegralType) => Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) case _ => None } // From TypeCoercion.findTightestCommonType val maybeType2 = (d, it) match { case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) => Some(t1) case _ => None } if (maybeType2.isDefined) { val t1 = maybeType1.get val t2 = maybeType2.get assert(t1 == t2) } } } } ``` ```scala val integralTypes = IndexedSeq( ByteType, ShortType, IntegerType, LongType) val decimals = (-38 to 38).flatMap { p => (-38 to 38).flatMap(s => allCatch opt DecimalType(p, s)) } assert(decimals.nonEmpty) integralTypes.foreach { it => test(s"$it test") { val widenDecimals = decimals.flatMap { d => // From TypeCoercion.findWiderTypeForTwo (d, it) match { case (d: DecimalType, t: IntegralType) => Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) case _ => None } }.toSet val tightDecimals = decimals.flatMap { d => // From TypeCoercion.findTightestCommonType (d, it) match { case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) => Some(t1) case _ => None } }.toSet assert(widenDecimals.nonEmpty) assert(tightDecimals.nonEmpty) assert(tightDecimals.subsetOf(widenDecimals)) } } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100730262 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -116,48 +114,66 @@ object TypeCoercion { * i.e. the main difference with [[findTightestCommonType]] is that here we allow some * loss of precision when widening decimal and double, and promotion to string. */ - private def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = (t1, t2) match { -case (t1: DecimalType, t2: DecimalType) => - Some(DecimalPrecision.widerDecimalType(t1, t2)) -case (t: IntegralType, d: DecimalType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (d: DecimalType, t: IntegralType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: FractionalType) => - Some(DoubleType) -case _ => - findTightestCommonTypeToString(t1, t2) + def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = { +findTightestCommonType(t1, t2) --- End diff -- I see. Thank you for catching it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16882: [SPARK-19544][SQL] Improve error message when som...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16882#discussion_r100779764 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -321,12 +321,12 @@ trait CheckAnalysis extends PredicateHelper { // Check if the data types match. dataTypes(child).zip(ref).zipWithIndex.foreach { case ((dt1, dt2), ci) => // SPARK-18058: we shall not care about the nullability of columns -if (!dt1.sameType(dt2)) { +if (TypeCoercion.findWiderTypeForTwo(dt1.asNullable, dt2.asNullable).isEmpty) { failAnalysis( s""" |${operator.nodeName} can only be performed on tables with the compatible - |column types. $dt1 <> $dt2 at the ${ordinalNumber(ci)} column of - |the ${ordinalNumber(ti + 1)} table + |column types. ${dt1.simpleString} <> ${dt2.simpleString} at the --- End diff -- Sure, let me change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16882: [SPARK-19544][SQL] Improve error message when some colum...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16882 Thank you @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16777: [SPARK-19435][SQL] Type coercion between ArrayTypes
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16777 (I just rebased and added `private[analysis]` for consistency) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16741: [SPARK-19402][DOCS] Support LaTex inline formula ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16741#discussion_r100924150 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -135,13 +135,13 @@ abstract class MLWriter extends BaseReadWrite with Logging { } /** - * Trait for classes that provide [[MLWriter]]. + * Trait for classes that provide `MLWriter`. --- End diff -- Hmm. That's weird. At least, it should warn because I copied and pasted each class names from the messages and went to the line number. Let me double check and be back. Maybe, I made mistakes for few. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16741: [SPARK-19402][DOCS] Support LaTex inline formula ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16741#discussion_r100958021 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -135,13 +135,13 @@ abstract class MLWriter extends BaseReadWrite with Logging { } /** - * Trait for classes that provide [[MLWriter]]. + * Trait for classes that provide `MLWriter`. --- End diff -- @jkbradley, Yes, it seems fine for both. I think I changed them given `sealed trait BaseReadWrite`... ``` [warn] .../spark/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala:143: Could not find any member to link for "MLWriter". [warn] /** [warn] ^ ``` and I think I swept them here as they looked identical. I am okay to revive identified ones back. My (maybe nitpicking) concern is, we should be able to identify the cases explicitly which I guess I and many guys failed and also what to do in each case. Otherwise, I think we should prefer backquotes because even if reviving links might work for now for some cases, we could easily introduce other breaks again when we change the codes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16533: [SPARK-19160][PYTHON][SQL] Add udf decorator
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16533 Thanks for cc'ing me. I like this pythonic way. +1 and looks okay to me too despite of the trick in argument checking. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16777: [SPARK-19435][SQL] Type coercion between ArrayTypes
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16777 Thank you @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16776: [SPARK-19436][SQL] Add missing tests for approxQu...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16776#discussion_r100986732 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala --- @@ -58,49 +58,54 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) { * @param probabilities a list of quantile probabilities * Each number must belong to [0, 1]. * For example 0 is the minimum, 0.5 is the median, 1 is the maximum. - * @param relativeError The relative target precision to achieve (greater or equal to 0). + * @param relativeError The relative target precision to achieve (greater than or equal to 0). * If set to zero, the exact quantiles are computed, which could be very expensive. * Note that values greater than 1 are accepted but give the same result as 1. * @return the approximate quantiles at the given probabilities * - * @note NaN values will be removed from the numerical column before calculation + * @note null and NaN values will be removed from the numerical column before calculation. If + * the dataframe is empty or all rows contain null or NaN, null is returned. * * @since 2.0.0 */ def approxQuantile( col: String, probabilities: Array[Double], relativeError: Double): Array[Double] = { -StatFunctions.multipleApproxQuantiles(df.select(col).na.drop(), - Seq(col), probabilities, relativeError).head.toArray +val res = approxQuantile(Array(col), probabilities, relativeError) +Option(res).map(_.head).orNull } /** * Calculates the approximate quantiles of numerical columns of a DataFrame. - * @see [[DataFrameStatsFunctions.approxQuantile(col:Str* approxQuantile]] for - * detailed description. + * @see `[[DataFrameStatsFunctions.approxQuantile(col:Str* approxQuantile]]` for detailed --- End diff -- nit: `DataFrameStatsFunctions` -> `DataFrameStatFunctions` or remove it. For example, just ``` `approxQuantile(String, Array[Double], Double)` ``` We could just wrap them by backticks without `[[ ... ]]` in general. It seems Scaladoc specific annotation also does not work to disambiguate the argument types. ``` [error] .../spark/sql/core/target/java/org/apache/spark/sql/DataFrameStatFunctions.java:43: error: unexpected content [error]* @see {@link DataFrameStatFunctions.approxQuantile(col:Str* approxQuantile)} for [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/DataFrameStatFunctions.java:45: error: unexpected text [error]* @see #approxQuantile(String, Array[Double], Double) for detailed description. [error] ^ ``` I guess It does not necessarily make a link if it breaks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16776: [SPARK-19436][SQL] Add missing tests for approxQu...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16776#discussion_r100987139 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala --- @@ -58,49 +58,54 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) { * @param probabilities a list of quantile probabilities * Each number must belong to [0, 1]. * For example 0 is the minimum, 0.5 is the median, 1 is the maximum. - * @param relativeError The relative target precision to achieve (greater or equal to 0). + * @param relativeError The relative target precision to achieve (greater than or equal to 0). * If set to zero, the exact quantiles are computed, which could be very expensive. * Note that values greater than 1 are accepted but give the same result as 1. * @return the approximate quantiles at the given probabilities * - * @note NaN values will be removed from the numerical column before calculation + * @note null and NaN values will be removed from the numerical column before calculation. If + * the dataframe is empty or all rows contain null or NaN, null is returned. * * @since 2.0.0 */ def approxQuantile( col: String, probabilities: Array[Double], relativeError: Double): Array[Double] = { -StatFunctions.multipleApproxQuantiles(df.select(col).na.drop(), - Seq(col), probabilities, relativeError).head.toArray +val res = approxQuantile(Array(col), probabilities, relativeError) +Option(res).map(_.head).orNull } /** * Calculates the approximate quantiles of numerical columns of a DataFrame. - * @see [[DataFrameStatsFunctions.approxQuantile(col:Str* approxQuantile]] for - * detailed description. + * @see `[[DataFrameStatsFunctions.approxQuantile(col:Str* approxQuantile]]` for detailed --- End diff -- It seems the breaks are queued up a bit. Let me sweep it soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16776: [SPARK-19436][SQL] Add missing tests for approxQu...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16776#discussion_r100992640 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala --- @@ -58,49 +58,52 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) { * @param probabilities a list of quantile probabilities * Each number must belong to [0, 1]. * For example 0 is the minimum, 0.5 is the median, 1 is the maximum. - * @param relativeError The relative target precision to achieve (greater or equal to 0). + * @param relativeError The relative target precision to achieve (greater than or equal to 0). * If set to zero, the exact quantiles are computed, which could be very expensive. * Note that values greater than 1 are accepted but give the same result as 1. * @return the approximate quantiles at the given probabilities * - * @note NaN values will be removed from the numerical column before calculation + * @note null and NaN values will be removed from the numerical column before calculation. If + * the dataframe is empty or all rows contain null or NaN, null is returned. * * @since 2.0.0 */ def approxQuantile( col: String, probabilities: Array[Double], relativeError: Double): Array[Double] = { -StatFunctions.multipleApproxQuantiles(df.select(col).na.drop(), - Seq(col), probabilities, relativeError).head.toArray +val res = approxQuantile(Array(col), probabilities, relativeError) +Option(res).map(_.head).orNull } /** * Calculates the approximate quantiles of numerical columns of a DataFrame. - * @see [[DataFrameStatsFunctions.approxQuantile(col:Str* approxQuantile]] for - * detailed description. * - * Note that rows containing any null or NaN values values will be removed before - * calculation. * @param cols the names of the numerical columns * @param probabilities a list of quantile probabilities * Each number must belong to [0, 1]. * For example 0 is the minimum, 0.5 is the median, 1 is the maximum. - * @param relativeError The relative target precision to achieve (>= 0). + * @param relativeError The relative target precision to achieve (greater than or equal to 0). * If set to zero, the exact quantiles are computed, which could be very expensive. * Note that values greater than 1 are accepted but give the same result as 1. * @return the approximate quantiles at the given probabilities of each column * - * @note Rows containing any NaN values will be removed before calculation + * @note Rows containing any null or NaN values will be removed before calculation. If + * the dataframe is empty or all rows contain null or NaN, null is returned. * * @since 2.2.0 */ def approxQuantile( cols: Array[String], probabilities: Array[Double], relativeError: Double): Array[Array[Double]] = { -StatFunctions.multipleApproxQuantiles(df.select(cols.map(col): _*).na.drop(), cols, - probabilities, relativeError).map(_.toArray).toArray +// TODO: Update NaN/null handling to keep consistent with the single-column version --- End diff -- Should we make this todo to JIRA? I guess It is generally not good to leave todos but file them in JIRAs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16776: [SPARK-19436][SQL] Add missing tests for approxQu...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16776#discussion_r100996427 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala --- @@ -58,49 +58,52 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) { * @param probabilities a list of quantile probabilities * Each number must belong to [0, 1]. * For example 0 is the minimum, 0.5 is the median, 1 is the maximum. - * @param relativeError The relative target precision to achieve (greater or equal to 0). + * @param relativeError The relative target precision to achieve (greater than or equal to 0). * If set to zero, the exact quantiles are computed, which could be very expensive. * Note that values greater than 1 are accepted but give the same result as 1. * @return the approximate quantiles at the given probabilities * - * @note NaN values will be removed from the numerical column before calculation + * @note null and NaN values will be removed from the numerical column before calculation. If + * the dataframe is empty or all rows contain null or NaN, null is returned. * * @since 2.0.0 */ def approxQuantile( col: String, probabilities: Array[Double], relativeError: Double): Array[Double] = { -StatFunctions.multipleApproxQuantiles(df.select(col).na.drop(), - Seq(col), probabilities, relativeError).head.toArray +val res = approxQuantile(Array(col), probabilities, relativeError) +Option(res).map(_.head).orNull } /** * Calculates the approximate quantiles of numerical columns of a DataFrame. - * @see [[DataFrameStatsFunctions.approxQuantile(col:Str* approxQuantile]] for --- End diff -- I am sorry. Actually, I meant remove `DataFrameStatFunctions` leaving the method as it is in the same class. Nevertheless, FWIW, I am fine with removing it as is given other functions here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16776: [SPARK-19436][SQL] Add missing tests for approxQu...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16776#discussion_r100997576 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala --- @@ -58,49 +58,52 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) { * @param probabilities a list of quantile probabilities * Each number must belong to [0, 1]. * For example 0 is the minimum, 0.5 is the median, 1 is the maximum. - * @param relativeError The relative target precision to achieve (greater or equal to 0). + * @param relativeError The relative target precision to achieve (greater than or equal to 0). * If set to zero, the exact quantiles are computed, which could be very expensive. * Note that values greater than 1 are accepted but give the same result as 1. * @return the approximate quantiles at the given probabilities * - * @note NaN values will be removed from the numerical column before calculation + * @note null and NaN values will be removed from the numerical column before calculation. If + * the dataframe is empty or all rows contain null or NaN, null is returned. * * @since 2.0.0 */ def approxQuantile( col: String, probabilities: Array[Double], relativeError: Double): Array[Double] = { -StatFunctions.multipleApproxQuantiles(df.select(col).na.drop(), - Seq(col), probabilities, relativeError).head.toArray +val res = approxQuantile(Array(col), probabilities, relativeError) +Option(res).map(_.head).orNull } /** * Calculates the approximate quantiles of numerical columns of a DataFrame. - * @see [[DataFrameStatsFunctions.approxQuantile(col:Str* approxQuantile]] for - * detailed description. * - * Note that rows containing any null or NaN values values will be removed before - * calculation. * @param cols the names of the numerical columns * @param probabilities a list of quantile probabilities * Each number must belong to [0, 1]. * For example 0 is the minimum, 0.5 is the median, 1 is the maximum. - * @param relativeError The relative target precision to achieve (>= 0). + * @param relativeError The relative target precision to achieve (greater than or equal to 0). * If set to zero, the exact quantiles are computed, which could be very expensive. * Note that values greater than 1 are accepted but give the same result as 1. * @return the approximate quantiles at the given probabilities of each column * - * @note Rows containing any NaN values will be removed before calculation + * @note Rows containing any null or NaN values will be removed before calculation. If + * the dataframe is empty or all rows contain null or NaN, null is returned. * * @since 2.2.0 */ def approxQuantile( cols: Array[String], probabilities: Array[Double], relativeError: Double): Array[Array[Double]] = { -StatFunctions.multipleApproxQuantiles(df.select(cols.map(col): _*).na.drop(), cols, - probabilities, relativeError).map(_.toArray).toArray +// TODO: Update NaN/null handling to keep consistent with the single-column version --- End diff -- I just saw your comment above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16926: [MINOR] Fix javadoc8 break
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/16926 [MINOR] Fix javadoc8 break ## What changes were proposed in this pull request? These error below seems caused by unidoc that does not understand double commented block. ``` [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:69: error: class, interface, or enum expected [error] * MapGroupsWithStateFunction<String, Integer, Integer, String> mappingFunction = [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:69: error: class, interface, or enum expected [error] * MapGroupsWithStateFunction<String, Integer, Integer, String> mappingFunction = [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:70: error: class, interface, or enum expected [error] *new MapGroupsWithStateFunction<String, Integer, Integer, String>() { [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:70: error: class, interface, or enum expected [error] *new MapGroupsWithStateFunction<String, Integer, Integer, String>() { [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:72: error: illegal character: '#' [error] * @Override [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:72: error: class, interface, or enum expected [error] * @Override [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:73: error: class, interface, or enum expected [error] * public String call(String key, Iterator<Integer> value, KeyedState<Integer> state) { [error]^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:73: error: class, interface, or enum expected [error] * public String call(String key, Iterator<Integer> value, KeyedState<Integer> state) { [error]^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:73: error: class, interface, or enum expected [error] * public String call(String key, Iterator<Integer> value, KeyedState<Integer> state) { [error]^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:73: error: class, interface, or enum expected [error] * public String call(String key, Iterator<Integer> value, KeyedState<Integer> state) { [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:73: error: class, interface, or enum expected [error] * public String call(String key, Iterator<Integer> value, KeyedState<Integer> state) { [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:76: error: class, interface, or enum expected [error] * boolean shouldRemove = ...; // Decide whether to remove the state [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:77: error: class, interface, or enum expected [error] * if (shouldRemove) { [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:79: error: class, interface, or enum expected [error] * } else { [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:81: error: class, interface, or enum expected [error] *state.update(newState); // Set the new state [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:82: error: class, interface, or enum expected [error] * } [error] ^ [error] .../forked/spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:85: error: class, interface, or enum expected [error] * state.update(initialState); [error] ^ [error] .../forked/spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:86: error: class, interface, or enum expected [error] *} [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:90: error: class, interface, or enum expected [error] * [error] ^ [error] .../spark/s
[GitHub] spark issue #16926: [MINOR] Fix javadoc8 break
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16926 Note that such many errors seem hiding more errors in the error messages. This starts to hide the more errors so I proposed this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16926: [MINOR] Fix javadoc8 break
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16926 cc @srowen. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16927: [WIP][SPARK-19571][R] Fix SparkR test break on Wi...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/16927 [WIP][SPARK-19571][R] Fix SparkR test break on Windows via AppVeyor ## What changes were proposed in this pull request? It seems wintuils for Hadoop 2.6.5 not exiting for now in https://github.com/steveloughran/winutils This breaks the tests in SparkR on Windows so this PR proposes to use winutils built by Hadoop 2.6.4 for now. ## How was this patch tested? Manually via AppVeyor **Before** https://ci.appveyor.com/project/spark-test/spark/build/627-r-test-break **After** https://ci.appveyor.com/project/spark-test/spark/build/629-r-test-break You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark spark-r-windows-break Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16927.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16927 commit 58af8fc830bb4c51bd80bc46da669125ec6339f6 Author: hyukjinkwon Date: 2017-02-14T14:48:37Z Fix SparkR test break on Windows via AppVeyor --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16927: [SPARK-19571][R] Fix SparkR test break on Windows via Ap...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16927 cc @felixcheung and @shivaram --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16927: [SPARK-19571][R] Fix SparkR test break on Windows via Ap...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16927 (I think I should cc @steveloughran too just FYI) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16929: [SPARK-19595][SQL] Do not allow json array in fro...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/16929 [SPARK-19595][SQL] Do not allow json array in from_json ## What changes were proposed in this pull request? Currently, it only reads the single row when the input is a json array. So, the codes below: ```scala import org.apache.spark.sql.functions._ import org.apache.spark.sql.types._ val schema = StructType(StructField("a", IntegerType) :: Nil) Seq(("""[{"a": 1}, {"a": 2}]""")).toDF("struct").select(from_json(col("struct"), schema)).show() ``` prints ``` ++ |jsontostruct(struct)| ++ | [1]| ++ ``` We may consider supporting this as a generator expression but I guess it'd be arguable. So, this PR simply suggests to disallow json array in `from_json` for now. **After** ``` ++ |jsontostruct(struct)| ++ |null| ++ ``` ## How was this patch tested? Unit test in `JsonExpressionsSuite` and manual test You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark disallow-array Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16929.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16929 commit acbce26cd983c4e3510a8db707196e3cd848aba2 Author: hyukjinkwon Date: 2017-02-14T15:37:00Z Do not allow json array in from_json --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16927: [SPARK-19571][R] Fix SparkR test break on Windows via Ap...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16927 Yea, I agree that the error message was hard to read. Maybe let me try to raise this issue in a JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16927: [SPARK-19571][R] Fix SparkR test break on Windows via Ap...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16927 cc @srowen too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org