[GitHub] spark pull request #22575: [SPARK-24630][SS] Support SQLStreaming in Spark

2018-12-06 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22575#discussion_r239500890 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -631,6 +631,33 @@ object SQLConf { .intConf

[GitHub] spark pull request #22575: [SPARK-24630][SS] Support SQLStreaming in Spark

2018-12-03 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22575#discussion_r238336135 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/SQLStreamingSink.scala --- @@ -0,0 +1,115 @@ +/* + * Licensed

[GitHub] spark pull request #22575: [SPARK-24630][SS] Support SQLStreaming in Spark

2018-12-03 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22575#discussion_r238329995 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -631,6 +631,33 @@ object SQLConf { .intConf

[GitHub] spark pull request #23197: [SPARK-26165][Optimizer] Filter Query Date and Ti...

2018-12-03 Thread sujith71955
Github user sujith71955 closed the pull request at: https://github.com/apache/spark/pull/23197 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #23197: [SPARK-26165][Optimizer] Filter Query Date and Timestamp...

2018-12-03 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/23197 Closing the PR has per Li's comment. Thanks for the suggestion Li. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-12-01 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 cc@wzhfy --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-12-01 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 do we need to handle this scenario? do we have any PR for handling this issue? --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #23197: [SPARK-26165][Optimizer] Filter Query Date and Timestamp...

2018-12-01 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/23197 cc marmbrus yhuai srowen vinodkc --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #23197: [SPARK-26165][Optimizer] Date and Timestamp colum...

2018-12-01 Thread sujith71955
GitHub user sujith71955 opened a pull request: https://github.com/apache/spark/pull/23197 [SPARK-26165][Optimizer] Date and Timestamp column expressions are getting converted to string type Date and Timestamp column expressions are getting converted to string type in less than

[GitHub] spark issue #22575: [SPARK-24630][SS] Support SQLStreaming in Spark

2018-11-27 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22575 > > Can you send a mail to Ryan blue for adding this SPIP topic in tomorrow meeting. Meeting will be conducted tomorrow 05:00 pm PST. If you confirm then we can also attend the m

[GitHub] spark issue #22575: [SPARK-24630][SS] Support SQLStreaming in Spark

2018-11-27 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22575 Can you send a mail to Ryan blue for adding this SPIP topic in tomorrow meeting. Meeting will be conducted tomorrow 05:00 pm PST. If you confirm then we can also attend the meeting

[GitHub] spark issue #22575: [SPARK-24630][SS] Support SQLStreaming in Spark

2018-11-27 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22575 cc @koeninger --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22575: [SPARK-24630][SS] Support SQLStreaming in Spark

2018-11-27 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22575 ![image](https://user-images.githubusercontent.com/12999161/49129177-ab056680-f2f4-11e8-8f71-4695ebc045c1.png) There is a DatasourceV2 community synch meetup tomorrow which

[GitHub] spark issue #22575: [SPARK-24630][SS] Support SQLStreaming in Spark

2018-11-25 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22575 @stczwd Can you provide a detail design document for this PR, by mentioning the cenarios is been handled and constraints if any. this wll give a complete pitcture about this PR. Thanks

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-11-08 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 I think this issue shall not be in improvement category, it shall be Critical Bug which is affecting the normal join query performances. Hope we address this issue. "Insert query

[GitHub] spark pull request #22721: [SPARK-25403][SQL] Refreshes the table after inse...

2018-11-08 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22721#discussion_r231796228 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala --- @@ -183,13 +183,14 @@ case

[GitHub] spark pull request #22721: [SPARK-25403][SQL] Refreshes the table after inse...

2018-11-08 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22721#discussion_r231795771 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala --- @@ -183,13 +183,14 @@ case

[GitHub] spark pull request #22721: [SPARK-25403][SQL] Refreshes the table after inse...

2018-11-08 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22721#discussion_r231795480 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala --- @@ -183,13 +183,14 @@ case

[GitHub] spark pull request #22721: [SPARK-25403][SQL] Refreshes the table after inse...

2018-11-08 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22721#discussion_r231794961 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala --- @@ -183,13 +183,14 @@ case

[GitHub] spark pull request #22721: [SPARK-25403][SQL] Refreshes the table after inse...

2018-11-07 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22721#discussion_r231790964 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala --- @@ -183,13 +183,14 @@ case

[GitHub] spark pull request #22721: [SPARK-25403][SQL] Refreshes the table after inse...

2018-11-07 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22721#discussion_r231789742 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala --- @@ -183,13 +183,14 @@ case

[GitHub] spark pull request #22721: [SPARK-25403][SQL] Refreshes the table after inse...

2018-11-07 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22721#discussion_r231789510 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala --- @@ -183,13 +183,14 @@ case

[GitHub] spark pull request #22721: [SPARK-25403][SQL] Refreshes the table after inse...

2018-11-07 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22721#discussion_r231785137 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala --- @@ -183,13 +183,14 @@ case

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-11-01 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 Thanks for the comment Sean , there are certain areas which i found inconsistencies, if i get some inputs from experts i think i can update the PR , if we are planning to tackle

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-11-01 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 cc @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-11-01 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 @cloud-fan @HyukjinKwon @wangyum Any suggestions on this issue , because of this defect we are facing some performance issues in our customer environment. Requesting you all to please have

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-28 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 @cloud-fan @HyukjinKwon @srowen As result of my above observations a) I am having some doubt like if we are expecting the stats shall estimate the data size with files then why

[GitHub] spark pull request #22758: [SPARK-25332][SQL] Instead of broadcast hash join...

2018-10-28 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22758#discussion_r228758621 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -193,6 +193,16 @@ private[hive] class

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-22 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 @cloud-fan Shall i update this PR based on the second approach, will that be fine?I tested with the second approach also and the usecases are working fine which is mentioned in this JIRA

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-18 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 > Inorder to make this flow consistent either > a) we need to record HiveStats for insert command flow and always consider this stats while compting > OR > b) As men

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-18 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 Inorder to make this flow consistent either a) we need to record HiveStats for insert command flow and always consider this stats while compting OR b) As mentioned above

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-18 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 > I think the cost of get the stats from `HadoopFileSystem` may be quite high. Then we shall depend on HiveStats always to get the statistics, which is happening now also but partia

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-18 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 @cloud-fan I can think as one solution, that In DetermineStats flow we can add one more condition to not update the stats for convertable relations, since we always get the stats from

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-18 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 @cloud-fan Please find my understanding of the flow as mentioned below, its bit tricky :) Lets elaborate this flow might be we get more suggestions. Step 1 : insert command

[GitHub] spark pull request #22758: [SPARK-25332][SQL] Instead of broadcast hash join...

2018-10-18 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22758#discussion_r226515799 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -193,6 +193,16 @@ private[hive] class

[GitHub] spark pull request #22758: [SPARK-25332][SQL] Instead of broadcast hash join...

2018-10-18 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22758#discussion_r226203075 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -193,6 +193,16 @@ private[hive] class

[GitHub] spark pull request #22758: [SPARK-25332][SQL] Instead of broadcast hash join...

2018-10-18 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22758#discussion_r226201756 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -193,6 +193,16 @@ private[hive] class

[GitHub] spark pull request #22758: [SPARK-25332][SQL] Instead of broadcast hash join...

2018-10-18 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22758#discussion_r226197174 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala --- @@ -1051,7 +1051,8 @@ class StatisticsSuite extends

[GitHub] spark pull request #22758: [SPARK-25332][SQL] Instead of broadcast hash join...

2018-10-18 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22758#discussion_r226192210 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -193,6 +193,16 @@ private[hive] class

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-17 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 @srowen @cloud-fan @HyukjinKwon @felixcheung. @wangyum i think this PR shall also solves the problem mentioned in SPARK-25403. Please review and provide me any suggestions. Thanks all

[GitHub] spark pull request #22758: [SPARK-25332][SQL] Instead of broadcast hash join...

2018-10-17 Thread sujith71955
GitHub user sujith71955 opened a pull request: https://github.com/apache/spark/pull/22758 [SPARK-25332][SQL] Instead of broadcast hash join ,Sort merge join has selected when restart spark-shell/spark-JDBC for hive provider ## What changes were proposed in this pull request

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-10-10 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/16677 Yes sure , i will create a ticket for this issue and Keep you guys in loop. Thanks --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-10-10 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/16677 Mainly i think we are trying to interpolate the number of partitions --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-10-10 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/16677 @viirya I am having a usecase where a normal query is taking around 5 seconds where same query with limit 5000 is taking around 17 sec. when i was checking i could find bottleneck

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-10-10 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/16677 @viirya Are we also looking to optimize CollectLimitExec part? I saw in SparkPlan we have an executeTake() method which basically interpolate the number of partitions and processes the limit

[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

2018-10-03 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22572 > > Can we update the PR to use `description.uuid` first? > > Updated FileFormatWriter with description.uuid, attaching the verification snapshot . > ![imag

[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

2018-10-03 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22572 > Can we update the PR to use `description.uuid` first? Updated FileFormatWriter with description.uuid, attaching the verification snapshot . ![image](https://u

[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

2018-10-03 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22572 When i digged the code i could see in SparkHadoopWriter, while creating job context itself job id is been intialized. ![image](https://user-images.githubusercontent.com/12999161

[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

2018-10-03 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22572 @srowen @cloud-fan I was testing the SparkHadoopWriter flow, with below steps and i could see in the log with job id printed properly, so is it fine to update this flow also

[GitHub] spark pull request #22572: [SPARK-25521][SQL]Job id showing null in the logs...

2018-09-28 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22572#discussion_r221371957 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala --- @@ -183,15 +183,16 @@ object FileFormatWriter

[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

2018-09-27 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22572 cc @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

2018-09-27 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22572 > Is the value logged here always null? > I am not sure if it's meaningful to log mapreduce.job.id, especially given its name. If there's no meaningful job ID here do we are about it

[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

2018-09-27 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22572 cc @cloud-fan @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22572: [SPARK-25521][SQL]Job id showing null in the logs...

2018-09-27 Thread sujith71955
GitHub user sujith71955 opened a pull request: https://github.com/apache/spark/pull/22572 [SPARK-25521][SQL]Job id showing null in the logs when insert into command Job is finished. ## What changes were proposed in this pull request? ``As part of insert command

[GitHub] spark pull request #22466: [SPARK-25464][SQL]On dropping the Database it wil...

2018-09-21 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r219443041 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -321,8 +321,19 @@ private[hive] class HiveClientImpl

[GitHub] spark pull request #22466: [SPARK-25464][SQL]On dropping the Database it wil...

2018-09-21 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r219442479 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -2348,4 +2348,41 @@ class HiveDDLSuite

[GitHub] spark pull request #22466: [SPARK-25464][SQL]On dropping the Database it wil...

2018-09-21 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r219440831 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -67,12 +67,15 @@ case class CreateDatabaseCommand

[GitHub] spark pull request #22466: [SPARK-25464][SQL]On dropping the Database it wil...

2018-09-21 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r219440456 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -67,12 +67,15 @@ case class CreateDatabaseCommand

[GitHub] spark issue #22466: [SPARK-25464][SQL]On dropping the Database it will drop ...

2018-09-21 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22466 @srowen @HyukjinKwon I think this can be a risk if the location of the newly created database points to an existing one, if user drop the db both the db data will be lost

[GitHub] spark issue #22466: [SPARK-25464][SQL]When database is dropped all the data ...

2018-09-21 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22466 @sandeep-katta can you please update the tile, i think as per your description it seems to be the data will be dropped if the location of the newly created database points to some existing

[GitHub] spark issue #22396: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in HDFS p...

2018-09-17 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22396 Any idea why some parts of the text are highlighting in blue color? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22396: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in...

2018-09-16 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22396#discussion_r217920696 --- Diff: docs/sql-programming-guide.md --- @@ -1898,6 +1898,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark pull request #22396: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in...

2018-09-16 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22396#discussion_r217920417 --- Diff: docs/sql-programming-guide.md --- @@ -1898,6 +1898,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark pull request #22396: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in...

2018-09-14 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22396#discussion_r217802972 --- Diff: docs/sql-programming-guide.md --- @@ -1898,6 +1898,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark pull request #22396: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in...

2018-09-14 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22396#discussion_r217673140 --- Diff: docs/sql-programming-guide.md --- @@ -1898,6 +1898,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark pull request #22396: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in...

2018-09-14 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22396#discussion_r217672824 --- Diff: docs/sql-programming-guide.md --- @@ -1898,6 +1898,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark pull request #22396: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in...

2018-09-12 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22396#discussion_r217151516 --- Diff: docs/sql-programming-guide.md --- @@ -1898,6 +1898,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark issue #22396: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in HDFS p...

2018-09-12 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22396 > That's fine, and worth adding to the "Docs Text" field in SPARK-23425 as it will then also go in release notes. What about a quick test case for this too? Added a UT

[GitHub] spark issue #22396: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in HDFS p...

2018-09-11 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22396 @gatorsmile @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22396: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in...

2018-09-11 Thread sujith71955
GitHub user sujith71955 opened a pull request: https://github.com/apache/spark/pull/22396 [SPARK-23425][SQL][FOLLOWUP] Support wildcards in HDFS path for loadtable command. What changes were proposed in this pull request Updated the Migration guide for the behavior changes

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-09-11 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r216771752 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -303,94 +303,44 @@ case class LoadDataCommand

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-09-11 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r216728326 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -303,94 +303,44 @@ case class LoadDataCommand

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-09-11 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r216694466 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -303,94 +303,44 @@ case class LoadDataCommand

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-09-11 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r216693375 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -303,94 +303,44 @@ case class LoadDataCommand

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-09-11 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r216693154 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -303,94 +303,44 @@ case class LoadDataCommand

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-09-11 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r216692677 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -303,94 +303,44 @@ case class LoadDataCommand

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-09-11 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r216685077 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -303,94 +303,44 @@ case class LoadDataCommand

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-09-11 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r216638992 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -303,94 +303,44 @@ case class LoadDataCommand

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-09-11 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r216638725 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -303,94 +303,44 @@ case class LoadDataCommand

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-09-10 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r216425911 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -303,94 +303,44 @@ case class LoadDataCommand

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-09-10 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r216417629 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -303,94 +303,44 @@ case class LoadDataCommand

[GitHub] spark pull request #22199: [SPARK-25073][Yarn] AM and Executor Memory valida...

2018-08-23 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22199#discussion_r212396099 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -338,13 +338,14 @@ private[spark] class Client

[GitHub] spark pull request #22199: [SPARK-25073][Yarn] AM and Executor Memory valida...

2018-08-23 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22199#discussion_r212392740 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -338,13 +338,14 @@ private[spark] class Client

[GitHub] spark pull request #22199: [SPARK-25073][Yarn] AM and Executor Memory valida...

2018-08-23 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22199#discussion_r212370528 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -338,13 +338,14 @@ private[spark] class Client

[GitHub] spark pull request #22199: [SPARK-25073][Yarn] AM and Executor Memory valida...

2018-08-23 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22199#discussion_r212356651 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -338,13 +338,14 @@ private[spark] class Client

[GitHub] spark pull request #22199: [SPARK-25073][Yarn] AM and Executor Memory valida...

2018-08-23 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22199#discussion_r212356336 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -338,13 +338,14 @@ private[spark] class Client

[GitHub] spark pull request #22199: [SPARK-25073][Yarn] AM and Executor Memory valida...

2018-08-23 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22199#discussion_r212354862 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -338,13 +338,14 @@ private[spark] class Client

[GitHub] spark pull request #22199: [SPARK-25073][SQL]When wild card is been used in ...

2018-08-23 Thread sujith71955
GitHub user sujith71955 opened a pull request: https://github.com/apache/spark/pull/22199 [SPARK-25073][SQL]When wild card is been used in load command system ## What changes were proposed in this pull request? When the yarn.nodemanager.resource.memory-mb

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-08-21 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/20611 @srowen Fixed the pending comments. Kindly recheck. Thanks --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-08-20 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/20611 @srowen got your point, i will update --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-08-20 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/20611 @srowen Make this method private -- can be right? This is more like a Util method where any feature deals with file system can use this method to form a path instance without

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-08-17 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/20611 Working fine with latest code. Thanks !!! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-08-17 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/20611 Did some testing in my cluster with updated code for verifying the load command with hdfs paths, please find the test results. Local path testing is already covered in my UT

[GitHub] spark pull request #22120: [SPARK-25131]Event logs missing applicationAttemp...

2018-08-16 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22120#discussion_r210599272 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala --- @@ -62,6 +62,10 @@ private

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-08-13 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/20611 Hi All, can we have a re-look into this PR and let me know whether is it looking fine. Thanks --- - To unsubscribe, e-mail

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-08-08 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/20611 @gatorsmile i added the comment. Thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-08-07 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/20611 Updated the PR by fixing the comment from sean. Hope i addressed all the issues :) --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-08-02 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/20611 I will reiterate again, actually this PR was intended for fixing the issue related to wildcard character issue in the hdfs file system scenarios , with the current solution we are also able

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-08-02 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/20611 i think support wildcard is confusing term :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-08-02 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/20611 No other changes in the load command behavior --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

  1   2   >