[jira] [Updated] (SPARK-26919) change maven default compile java home
[ https://issues.apache.org/jira/browse/SPARK-26919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] daile updated SPARK-26919: -- Attachment: p1.png > change maven default compile java home > -- > > Key: SPARK-26919 > URL: https://issues.apache.org/jira/browse/SPARK-26919 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.4.1 >Reporter: daile >Priority: Critical > Attachments: p1.png > > > when i use "build/mvn -DskipTests clean package" the deafult java home > Configuration " > ${java.home}". I tried the environment of mac os and winodws and found that > the default java.home is */jre but the jre environment does not have the > javac complie command. So I think it can be replaced with the system > environment variable and the test is successfully compiled. > !image-2019-02-19-10-25-02-872.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26919) change maven default compile java home
daile created SPARK-26919: - Summary: change maven default compile java home Key: SPARK-26919 URL: https://issues.apache.org/jira/browse/SPARK-26919 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 2.4.1 Reporter: daile Attachments: p1.png when i use "build/mvn -DskipTests clean package" the deafult java home Configuration " ${java.home}". I tried the environment of mac os and winodws and found that the default java.home is */jre but the jre environment does not have the javac complie command. So I think it can be replaced with the system environment variable and the test is successfully compiled. !image-2019-02-19-10-25-02-872.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26948) vertex and edge rowkey upgrade and support multiple types?
daile created SPARK-26948: - Summary: vertex and edge rowkey upgrade and support multiple types? Key: SPARK-26948 URL: https://issues.apache.org/jira/browse/SPARK-26948 Project: Spark Issue Type: Improvement Components: GraphX Affects Versions: 2.4.0 Reporter: daile Currently only Long is supported, but most of the graph databases use string as the primary key. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26919) change maven default compile java home
[ https://issues.apache.org/jira/browse/SPARK-26919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] daile resolved SPARK-26919. --- Resolution: Done Fix Version/s: 2.4.0 > change maven default compile java home > -- > > Key: SPARK-26919 > URL: https://issues.apache.org/jira/browse/SPARK-26919 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.4.1 >Reporter: daile >Priority: Critical > Fix For: 2.4.0 > > Attachments: p1.png > > > when i use "build/mvn -DskipTests clean package" the deafult java home > Configuration " > ${java.home}". I tried the environment of mac os and winodws and found that > the default java.home is */jre but the jre environment does not have the > javac complie command. So I think it can be replaced with the system > environment variable and the test is successfully compiled. > !image-2019-02-19-10-25-02-872.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27336) Incorrect DataSet.summary() result
[ https://issues.apache.org/jira/browse/SPARK-27336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920557#comment-16920557 ] daile commented on SPARK-27336: --- I will check this issue > Incorrect DataSet.summary() result > -- > > Key: SPARK-27336 > URL: https://issues.apache.org/jira/browse/SPARK-27336 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Priority: Major > Attachments: test.csv > > > There is a single data point in the minimum_nights column that is 1.0E8 out > of 8k records, but .summary() says it is the 75% and the max. > I compared this with approxQuantile, and approxQuantile for 75% gave the > correct value of 30.0. > To reproduce: > {code:java} > scala> val df = > spark.read.format("csv").load("test.csv").withColumn("minimum_nights", > '_c0.cast("Int")) > df: org.apache.spark.sql.DataFrame = [_c0: string, minimum_nights: int] > scala> df.select("minimum_nights").summary().show() > +---+--+ > |summary|minimum_nights| > +---+--+ > | count| 7072| > | mean| 14156.35407239819| > | stddev|1189128.5444975856| > |min| 1| > |25%| 2| > |50%| 4| > |75%| 1| > |max| 1| > +---+--+ > scala> df.stat.approxQuantile("minimum_nights", Array(0.75), 0.1) > res1: Array[Double] = Array(30.0) > scala> df.stat.approxQuantile("minimum_nights", Array(0.75), 0.001) > res2: Array[Double] = Array(30.0) > scala> df.stat.approxQuantile("minimum_nights", Array(0.75), 0.0001) > res3: Array[Double] = Array(1.0E8) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28694) Add Java/Scala StructuredKerberizedKafkaWordCount examples
[ https://issues.apache.org/jira/browse/SPARK-28694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920665#comment-16920665 ] daile commented on SPARK-28694: --- I will work on this > Add Java/Scala StructuredKerberizedKafkaWordCount examples > -- > > Key: SPARK-28694 > URL: https://issues.apache.org/jira/browse/SPARK-28694 > Project: Spark > Issue Type: Improvement > Components: Examples, Structured Streaming >Affects Versions: 3.0.0 >Reporter: hong dongdong >Priority: Minor > > Now,`StructuredKafkaWordCount` example is not support to visit kafka using > kerberos authentication. Add a parameter which target if kerberos is used. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28694) Add Java/Scala StructuredKerberizedKafkaWordCount examples
[ https://issues.apache.org/jira/browse/SPARK-28694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920677#comment-16920677 ] daile commented on SPARK-28694: --- ok > Add Java/Scala StructuredKerberizedKafkaWordCount examples > -- > > Key: SPARK-28694 > URL: https://issues.apache.org/jira/browse/SPARK-28694 > Project: Spark > Issue Type: Improvement > Components: Examples, Structured Streaming >Affects Versions: 3.0.0 >Reporter: hong dongdong >Priority: Minor > > Now,`StructuredKafkaWordCount` example is not support to visit kafka using > kerberos authentication. Add a parameter which target if kerberos is used. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28956) Make it tighter
daile created SPARK-28956: - Summary: Make it tighter Key: SPARK-28956 URL: https://issues.apache.org/jira/browse/SPARK-28956 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.0.0 Reporter: daile {code:java} //代码占位符 private def numStd(s: Double): Double = { // TODO: Make it tighter. if (s < 6.0) { 12.0 } else if (s < 16.0) { 9.0 } else { 6.0 } }{code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28956) Make it tighter
[ https://issues.apache.org/jira/browse/SPARK-28956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] daile updated SPARK-28956: -- Priority: Minor (was: Major) > Make it tighter > --- > > Key: SPARK-28956 > URL: https://issues.apache.org/jira/browse/SPARK-28956 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: daile >Priority: Minor > > {code:java} > //代码占位符 > private def numStd(s: Double): Double = { > // TODO: Make it tighter. > if (s < 6.0) { > 12.0 > } else if (s < 16.0) { > 9.0 > } else { > 6.0 > } > }{code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28121) String Functions: decode can not accept 'escape' and 'hex' as charset
[ https://issues.apache.org/jira/browse/SPARK-28121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921369#comment-16921369 ] daile commented on SPARK-28121: --- i will work on this > String Functions: decode can not accept 'escape' and 'hex' as charset > - > > Key: SPARK-28121 > URL: https://issues.apache.org/jira/browse/SPARK-28121 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > {noformat} > postgres=# select decode('1234567890','escape'); > decode > > \x31323334353637383930 > (1 row) > {noformat} > {noformat} > spark-sql> select decode('1234567890','escape'); > 19/06/20 01:57:33 ERROR SparkSQLDriver: Failed in [select > decode('1234567890','escape')] > java.io.UnsupportedEncodingException: escape > at java.lang.StringCoding.decode(StringCoding.java:190) > at java.lang.String.(String.java:426) > at java.lang.String.(String.java:491) > ... > spark-sql> select decode('ff','hex'); > 19/08/16 21:44:55 ERROR SparkSQLDriver: Failed in [select decode('ff','hex')] > java.io.UnsupportedEncodingException: hex > at java.lang.StringCoding.decode(StringCoding.java:190) > at java.lang.String.(String.java:426) > at java.lang.String.(String.java:491) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-28121) String Functions: decode can not accept 'escape' and 'hex' as charset
[ https://issues.apache.org/jira/browse/SPARK-28121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921369#comment-16921369 ] daile edited comment on SPARK-28121 at 9/3/19 1:27 PM: --- you can use sql like this {code:java} select hex('1234567890');{code} was (Author: 726575...@qq.com): i will work on this > String Functions: decode can not accept 'escape' and 'hex' as charset > - > > Key: SPARK-28121 > URL: https://issues.apache.org/jira/browse/SPARK-28121 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > {noformat} > postgres=# select decode('1234567890','escape'); > decode > > \x31323334353637383930 > (1 row) > {noformat} > {noformat} > spark-sql> select decode('1234567890','escape'); > 19/06/20 01:57:33 ERROR SparkSQLDriver: Failed in [select > decode('1234567890','escape')] > java.io.UnsupportedEncodingException: escape > at java.lang.StringCoding.decode(StringCoding.java:190) > at java.lang.String.(String.java:426) > at java.lang.String.(String.java:491) > ... > spark-sql> select decode('ff','hex'); > 19/08/16 21:44:55 ERROR SparkSQLDriver: Failed in [select decode('ff','hex')] > java.io.UnsupportedEncodingException: hex > at java.lang.StringCoding.decode(StringCoding.java:190) > at java.lang.String.(String.java:426) > at java.lang.String.(String.java:491) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28990) SparkSQL invalid call to toAttribute on unresolved object, tree: *
[ https://issues.apache.org/jira/browse/SPARK-28990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933163#comment-16933163 ] daile commented on SPARK-28990: --- It seems to have been solved in 3.0 > SparkSQL invalid call to toAttribute on unresolved object, tree: * > -- > > Key: SPARK-28990 > URL: https://issues.apache.org/jira/browse/SPARK-28990 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: fengchaoge >Priority: Major > > SparkSQL create table as select from one table which may not exists throw > exceptions like: > {code} > org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to > toAttribute on unresolved object, tree: > {code} > This is not friendly, spark user may have no idea about what's wrong. > Simple sql can reproduce it,like this: > {code} > spark-sql (default)> create table default.spark as select * from default.dual; > {code} > {code} > 2019-09-05 16:27:24,127 INFO (main) [Logging.scala:logInfo(54)] - Parsing > command: create table default.spark as select * from default.dual > 2019-09-05 16:27:24,772 ERROR (main) [Logging.scala:logError(91)] - Failed in > [create table default.spark as select * from default.dual] > org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to > toAttribute on unresolved object, tree: * > at > org.apache.spark.sql.catalyst.analysis.Star.toAttribute(unresolved.scala:245) > at > org.apache.spark.sql.catalyst.plans.logical.Project$$anonfun$output$1.apply(basicLogicalOperators.scala:52) > at > org.apache.spark.sql.catalyst.plans.logical.Project$$anonfun$output$1.apply(basicLogicalOperators.scala:52) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:296) > at > org.apache.spark.sql.catalyst.plans.logical.Project.output(basicLogicalOperators.scala:52) > at > org.apache.spark.sql.hive.HiveAnalysis$$anonfun$apply$3.applyOrElse(HiveStrategies.scala:160) > at > org.apache.spark.sql.hive.HiveAnalysis$$anonfun$apply$3.applyOrElse(HiveStrategies.scala:148) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1$$anonfun$2.apply(AnalysisHelper.scala:108) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1$$anonfun$2.apply(AnalysisHelper.scala:108) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1.apply(AnalysisHelper.scala:107) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1.apply(AnalysisHelper.scala:106) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsDown(AnalysisHelper.scala:106) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperators(AnalysisHelper.scala:73) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:29) > at org.apache.spark.sql.hive.HiveAnalysis$.apply(HiveStrategies.scala:148) > at org.apache.spark.sql.hive.HiveAnalysis$.apply(HiveStrategies.scala:147) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84) > at > scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) > at > scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) > at scala.collection.mutable.ArrayBuffer.foldLeft(ArrayBuffer.scala:48) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76) > at scala.collection.immutable.List.foreach(List.scala:392) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:127) > at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:12
[jira] [Commented] (SPARK-29174) LOCAL is not supported in INSERT OVERWRITE DIRECTORY to data source
[ https://issues.apache.org/jira/browse/SPARK-29174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933181#comment-16933181 ] daile commented on SPARK-29174: --- /** * * Expected format: * {{{ * INSERT OVERWRITE DIRECTORY * [path] * [OPTIONS table_property_list] * select_statement; * }}} */ > LOCAL is not supported in INSERT OVERWRITE DIRECTORY to data source > --- > > Key: SPARK-29174 > URL: https://issues.apache.org/jira/browse/SPARK-29174 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > > *using does not work for insert overwrite when in local but works when > insert overwrite in HDFS directory* > ** > > 0: jdbc:hive2://10.18.18.214:23040/default> insert overwrite directory > '/user/trash2/' using parquet select * from trash1 a where a.country='PAK'; > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (0.448 seconds) > 0: jdbc:hive2://10.18.18.214:23040/default> insert overwrite local directory > '/opt/trash2/' using parquet select * from trash1 a where a.country='PAK'; > Error: org.apache.spark.sql.catalyst.parser.ParseException: > LOCAL is not supported in INSERT OVERWRITE DIRECTORY to data source(line 1, > pos 0) > > == SQL == > insert overwrite local directory '/opt/trash2/' using parquet select * from > trash1 a where a.country='PAK' > ^^^ (state=,code=0) > 0: jdbc:hive2://10.18.18.214:23040/default> insert overwrite local directory > '/opt/trash2/' stored as parquet select * from trash1 a where a.country='PAK'; > +-+--+ > | Result | > +-+--+ > | | | > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-29174) LOCAL is not supported in INSERT OVERWRITE DIRECTORY to data source
[ https://issues.apache.org/jira/browse/SPARK-29174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] daile updated SPARK-29174: -- Comment: was deleted (was: /** * * Expected format: * {{{ * INSERT OVERWRITE DIRECTORY * [path] * [OPTIONS table_property_list] * select_statement; * }}} */) > LOCAL is not supported in INSERT OVERWRITE DIRECTORY to data source > --- > > Key: SPARK-29174 > URL: https://issues.apache.org/jira/browse/SPARK-29174 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > > *using does not work for insert overwrite when in local but works when > insert overwrite in HDFS directory* > ** > > 0: jdbc:hive2://10.18.18.214:23040/default> insert overwrite directory > '/user/trash2/' using parquet select * from trash1 a where a.country='PAK'; > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (0.448 seconds) > 0: jdbc:hive2://10.18.18.214:23040/default> insert overwrite local directory > '/opt/trash2/' using parquet select * from trash1 a where a.country='PAK'; > Error: org.apache.spark.sql.catalyst.parser.ParseException: > LOCAL is not supported in INSERT OVERWRITE DIRECTORY to data source(line 1, > pos 0) > > == SQL == > insert overwrite local directory '/opt/trash2/' using parquet select * from > trash1 a where a.country='PAK' > ^^^ (state=,code=0) > 0: jdbc:hive2://10.18.18.214:23040/default> insert overwrite local directory > '/opt/trash2/' stored as parquet select * from trash1 a where a.country='PAK'; > +-+--+ > | Result | > +-+--+ > | | | > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29174) LOCAL is not supported in INSERT OVERWRITE DIRECTORY to data source
[ https://issues.apache.org/jira/browse/SPARK-29174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933194#comment-16933194 ] daile commented on SPARK-29174: --- I tried to remove the judgment and found that it works. I don’t know if there will be any other impact. > LOCAL is not supported in INSERT OVERWRITE DIRECTORY to data source > --- > > Key: SPARK-29174 > URL: https://issues.apache.org/jira/browse/SPARK-29174 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > > *using does not work for insert overwrite when in local but works when > insert overwrite in HDFS directory* > ** > > 0: jdbc:hive2://10.18.18.214:23040/default> insert overwrite directory > '/user/trash2/' using parquet select * from trash1 a where a.country='PAK'; > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (0.448 seconds) > 0: jdbc:hive2://10.18.18.214:23040/default> insert overwrite local directory > '/opt/trash2/' using parquet select * from trash1 a where a.country='PAK'; > Error: org.apache.spark.sql.catalyst.parser.ParseException: > LOCAL is not supported in INSERT OVERWRITE DIRECTORY to data source(line 1, > pos 0) > > == SQL == > insert overwrite local directory '/opt/trash2/' using parquet select * from > trash1 a where a.country='PAK' > ^^^ (state=,code=0) > 0: jdbc:hive2://10.18.18.214:23040/default> insert overwrite local directory > '/opt/trash2/' stored as parquet select * from trash1 a where a.country='PAK'; > +-+--+ > | Result | > +-+--+ > | | | > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29586) spark jdbc method param lowerBound and upperBound DataType wrong
daile created SPARK-29586: - Summary: spark jdbc method param lowerBound and upperBound DataType wrong Key: SPARK-29586 URL: https://issues.apache.org/jira/browse/SPARK-29586 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.4, 3.0.0 Reporter: daile ``` private def toBoundValueInWhereClause( value: Long, columnType: DataType, timeZoneId: String): String = { def dateTimeToString(): String = { val dateTimeStr = columnType match { case DateType => DateFormatter().format(value.toInt) case TimestampType => val timestampFormatter = TimestampFormatter.getFractionFormatter( DateTimeUtils.getZoneId(timeZoneId)) DateTimeUtils.timestampToString(timestampFormatter, value) } s"'$dateTimeStr'" } columnType match { case _: NumericType => value.toString case DateType | TimestampType => dateTimeToString() } } ``` partitionColumn supoort NumericType, TimestampType, TimestampType but jdbc method only accept Long ``` test("jdbc Suite2") { val df = spark .read .option("partitionColumn", "B") .option("lowerBound", "2017-01-01 10:00:00") .option("upperBound", "2019-01-01 10:00:00") .option("numPartitions", 5) .jdbc(urlWithUserAndPass, "TEST.TIMETYPES", new Properties()) df.printSchema() df.show() } ``` it's OK ``` test("jdbc Suite2") { val df = spark .read .option("partitionColumn", "B") .option("lowerBound", "2017-01-01 10:00:00") .option("upperBound", "2019-01-01 10:00:00") .option("numPartitions", 5) .jdbc(urlWithUserAndPass, "TEST.TIMETYPES", new Properties()) df.printSchema() df.show() } ``` ``` test("jdbc Suite") { val df = spark.read.jdbc(urlWithUserAndPass, "TEST.TIMETYPES", "B", 1571899768024L, 1571899768024L, 5, new Properties()) df.printSchema() df.show() } ``` ```Cannot parse the bound value 1571899768024 as date java.lang.IllegalArgumentException: Cannot parse the bound value 1571899768024 as date at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.$anonfun$toInternalBoundValue$1(JDBCRelation.scala:184) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.parse$1(JDBCRelation.scala:183) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:189) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.columnPartition(JDBCRelation.scala:88) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:339) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:240) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:229) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:229) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:179) at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:255) at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:297) at org.apache.spark.sql.jdbc.JDBCSuite.$anonfun$new$186(JDBCSuite.scala:1664) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149) at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:56) at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221) at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214) at org.apache.spark.sql.jdbc.JDBCSuite.org$scalatest$BeforeAndAfter$$super$runTest(JDBCSuite.scala:43) at org.scalatest.BeforeAndAfter.runTest(BeforeAndAfter.scala:203) at org.scalatest.BeforeAndAfter.runTest$(BeforeAndAfter.sc
[jira] [Updated] (SPARK-29586) spark jdbc method param lowerBound and upperBound DataType wrong
[ https://issues.apache.org/jira/browse/SPARK-29586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] daile updated SPARK-29586: -- Description: {code:java} private def toBoundValueInWhereClause( value: Long, columnType: DataType, timeZoneId: String): String = { def dateTimeToString(): String = { val dateTimeStr = columnType match { case DateType => DateFormatter().format(value.toInt) case TimestampType => val timestampFormatter = TimestampFormatter.getFractionFormatter( DateTimeUtils.getZoneId(timeZoneId)) DateTimeUtils.timestampToString(timestampFormatter, value) } s"'$dateTimeStr'" } columnType match { case _: NumericType => value.toString case DateType | TimestampType => dateTimeToString() } }{code} partitionColumn supoort NumericType, TimestampType, TimestampType but jdbc method only accept Long test("jdbc Suite2") { val df = spark .read .option("partitionColumn", "B") .option("lowerBound", "2017-01-01 10:00:00") .option("upperBound", "2019-01-01 10:00:00") .option("numPartitions", 5) .jdbc(urlWithUserAndPass, "TEST.TIMETYPES", new Properties()) df.printSchema() df.show() } test("jdbc Suite2") { val df = spark .read .option("partitionColumn", "B") .option("lowerBound", "2017-01-01 10:00:00") .option("upperBound", "2019-01-01 10:00:00") .option("numPartitions", 5) .jdbc(urlWithUserAndPass, "TEST.TIMETYPES", new Properties()) df.printSchema() df.show() } test("jdbc Suite") { val df = spark.read.jdbc(urlWithUserAndPass, "TEST.TIMETYPES", "B", 1571899768024L, 1571899768024L, 5, new Properties()) df.printSchema() df.show() } java.lang.IllegalArgumentException: Cannot parse the bound value 1571899768024 as date at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.$anonfun$toInternalBoundValue$1(JDBCRelation.scala:184) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.parse$1(JDBCRelation.scala:183) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:189) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.columnPartition(JDBCRelation.scala:88) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:339) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:240) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:229) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:229) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:179) at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:255) at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:297) at org.apache.spark.sql.jdbc.JDBCSuite.$anonfun$new$186(JDBCSuite.scala:1664) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149) at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:56) at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221) at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214) at org.apache.spark.sql.jdbc.JDBCSuite.org$scalatest$BeforeAndAfter$$super$runTest(JDBCSuite.scala:43) at org.scalatest.BeforeAndAfter.runTest(BeforeAndAfter.scala:203) at org.scalatest.BeforeAndAfter.runTest$(BeforeAndAfter.scala:192) at org.apache.spark.sql.jdbc.JDBCSuite.runTest(JDBCSuite.scala:43) at org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229) at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:396) at scala.collection.immutable.List.foreach(List.scala:392) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384) at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:379) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461) at org.sc
[jira] [Updated] (SPARK-29586) spark jdbc method param lowerBound and upperBound DataType wrong
[ https://issues.apache.org/jira/browse/SPARK-29586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] daile updated SPARK-29586: -- Description: {code:java} private def toBoundValueInWhereClause( value: Long, columnType: DataType, timeZoneId: String): String = { def dateTimeToString(): String = { val dateTimeStr = columnType match { case DateType => DateFormatter().format(value.toInt) case TimestampType => val timestampFormatter = TimestampFormatter.getFractionFormatter( DateTimeUtils.getZoneId(timeZoneId)) DateTimeUtils.timestampToString(timestampFormatter, value) } s"'$dateTimeStr'" } columnType match { case _: NumericType => value.toString case DateType | TimestampType => dateTimeToString() } }{code} partitionColumn supoort NumericType, TimestampType, TimestampType but jdbc method only accept Long {code:java} test("jdbc Suite2") { val df = spark .read .option("partitionColumn", "B") .option("lowerBound", "2017-01-01 10:00:00") .option("upperBound", "2019-01-01 10:00:00") .option("numPartitions", 5) .jdbc(urlWithUserAndPass, "TEST.TIMETYPES", new Properties()) df.printSchema() df.show() } {code} it's OK {code:java} test("jdbc Suite") { val df = spark.read.jdbc(urlWithUserAndPass, "TEST.TIMETYPES", "B", 1571899768024L, 1571899768024L, 5, new Properties()) df.printSchema() df.show() } {code} {code:java} java.lang.IllegalArgumentException: Cannot parse the bound value 1571899768024 as datejava.lang.IllegalArgumentException: Cannot parse the bound value 1571899768024 as date at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.$anonfun$toInternalBoundValue$1(JDBCRelation.scala:184) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.parse$1(JDBCRelation.scala:183) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:189) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.columnPartition(JDBCRelation.scala:88) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:339) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:240) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:229) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:229) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:179) at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:255) at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:297) at org.apache.spark.sql.jdbc.JDBCSuite.$anonfun$new$186(JDBCSuite.scala:1664) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149) at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:56) at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221) at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214) at org.apache.spark.sql.jdbc.JDBCSuite.org$scalatest$BeforeAndAfter$$super$runTest(JDBCSuite.scala:43) at org.scalatest.BeforeAndAfter.runTest(BeforeAndAfter.scala:203) at org.scalatest.BeforeAndAfter.runTest$(BeforeAndAfter.scala:192) at org.apache.spark.sql.jdbc.JDBCSuite.runTest(JDBCSuite.scala:43) at org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229) at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:396) at scala.collection.immutable.List.foreach(List.scala:392) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384) at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:379) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461) at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229) at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228) at org.scalatest.FunSuite.runTests(FunSuite.scala:1560) at org.scalatest.Suite.run(Suite.scal
[jira] [Commented] (SPARK-29596) Task duration not updating for running tasks
[ https://issues.apache.org/jira/browse/SPARK-29596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004261#comment-17004261 ] daile commented on SPARK-29596: --- [~hyukjin.kwon] I checked the problem and reproduced in 2.4.4 version and will raise PR soon > Task duration not updating for running tasks > > > Key: SPARK-29596 > URL: https://issues.apache.org/jira/browse/SPARK-29596 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.2 >Reporter: Bharati Jadhav >Priority: Major > Attachments: Screenshot_Spark_live_WebUI.png > > > When looking at the task metrics for running tasks in the task table for the > related stage, the duration column is not updated until the task has > succeeded. The duration values are reported empty or 0 ms until the task has > completed. This is a change in behavior, from earlier versions, when the task > duration was continuously updated while the task was running. The missing > duration values can be observed for both short and long running tasks and for > multiple applications. > > To reproduce this, one can run any code from the spark-shell and observe the > missing duration values for any running task. Only when the task succeeds is > the duration value populated in the UI. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29596) Task duration not updating for running tasks
[ https://issues.apache.org/jira/browse/SPARK-29596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008552#comment-17008552 ] daile commented on SPARK-29596: --- [~hyukjin.kwon] task detail list use task.taskMetrics info , but task.taskMetrics will only be updated when task finshed , Is it feasible to get task Duration in its running state ? https://github.com/apache/spark/pull/27026 > Task duration not updating for running tasks > > > Key: SPARK-29596 > URL: https://issues.apache.org/jira/browse/SPARK-29596 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.2 >Reporter: Bharati Jadhav >Priority: Major > Attachments: Screenshot_Spark_live_WebUI.png > > > When looking at the task metrics for running tasks in the task table for the > related stage, the duration column is not updated until the task has > succeeded. The duration values are reported empty or 0 ms until the task has > completed. This is a change in behavior, from earlier versions, when the task > duration was continuously updated while the task was running. The missing > duration values can be observed for both short and long running tasks and for > multiple applications. > > To reproduce this, one can run any code from the spark-shell and observe the > missing duration values for any running task. Only when the task succeeds is > the duration value populated in the UI. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31686) Return of String instead of array in function get_json_object
[ https://issues.apache.org/jira/browse/SPARK-31686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105939#comment-17105939 ] daile commented on SPARK-31686: --- [~bruneltouopi] looks like it was specifically removed {code:java} val buf = buffer.getBuffer if (dirty > 1) { g.writeRawValue(buf.toString) } else if (dirty == 1) { // remove outer array tokens g.writeRawValue(buf.substring(1, buf.length()-1)) } // else do not write anything {code} > Return of String instead of array in function get_json_object > - > > Key: SPARK-31686 > URL: https://issues.apache.org/jira/browse/SPARK-31686 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.5 > Environment: {code:json} > // code placeholder > { > customer:{ > addesses:[ { {code} > location : arizona > } > ] > } > } > get_json_object(string(customer),'$addresses[*].location') > return "arizona" > result expected should be > ["arizona"] >Reporter: Touopi Touopi >Priority: Major > > when we selecting a node of a json object that is array, > When the array contains One element , the get_json_object return a String > with " characters instead of an array of One element. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31193) set spark.master and spark.app.name conf default value
daile created SPARK-31193: - Summary: set spark.master and spark.app.name conf default value Key: SPARK-31193 URL: https://issues.apache.org/jira/browse/SPARK-31193 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.4.5, 2.4.4, 2.4.3, 2.4.2, 2.4.0, 2.3.3, 2.3.0, 3.1.0 Reporter: daile Fix For: 3.1.0 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31193) set spark.master and spark.app.name conf default value
[ https://issues.apache.org/jira/browse/SPARK-31193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] daile updated SPARK-31193: -- Description: {code:java} //代码占位符 {code} I see the default value of master setting in spark-submit client ```scala // Global defaults. These should be keep to minimum to avoid confusing behavior. master = Option(master).getOrElse("local[*]") ``` but during our development and debugging, We will encounter this kind of problem Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration This conflicts with the default setting ```scala //If we do val sparkConf = new SparkConf().setAppName(“app”) //When using the client to submit tasks to the cluster, the matser will be overwritten by the local sparkConf.set("spark.master", "local[*]") ``` so we have to do like this ```scala val sparkConf = new SparkConf().setAppName(“app”) //Because the program runs to set the priority of the master, we have to first determine whether to set the master to avoid submitting the cluster to run. sparkConf.set("spark.master",sparkConf.get("spark.master","local[*]")) ``` so is spark.app.name Is it better for users to handle it like submit client ? > set spark.master and spark.app.name conf default value > -- > > Key: SPARK-31193 > URL: https://issues.apache.org/jira/browse/SPARK-31193 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0, 2.3.3, 2.4.0, 2.4.2, 2.4.3, 2.4.4, 2.4.5, 3.1.0 >Reporter: daile >Priority: Major > Fix For: 3.1.0 > > > > > {code:java} > //代码占位符 > {code} > I see the default value of master setting in spark-submit client > > > ```scala > // Global defaults. These should be keep to minimum to avoid confusing > behavior. > master = Option(master).getOrElse("local[*]") > ``` > but during our development and debugging, We will encounter this kind of > problem > Exception in thread "main" org.apache.spark.SparkException: A master URL must > be set in your configuration > This conflicts with the default setting > ```scala > //If we do > val sparkConf = new SparkConf().setAppName(“app”) > //When using the client to submit tasks to the cluster, the matser will be > overwritten by the local > sparkConf.set("spark.master", "local[*]") > ``` > so we have to do like this > ```scala > val sparkConf = new SparkConf().setAppName(“app”) > //Because the program runs to set the priority of the master, we have to > first determine whether to set the master to avoid submitting the cluster to > run. > sparkConf.set("spark.master",sparkConf.get("spark.master","local[*]")) > ``` > so is spark.app.name > Is it better for users to handle it like submit client ? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31193) set spark.master and spark.app.name conf default value
[ https://issues.apache.org/jira/browse/SPARK-31193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] daile updated SPARK-31193: -- Description: I see the default value of master setting in spark-submit client {code:java} // Global defaults. These should be keep to minimum to avoid confusing behavior. master = Option(master).getOrElse("local[*]") {code} but during our development and debugging, We will encounter this kind of problem Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration This conflicts with the default setting {code:java} //If we do val sparkConf = new SparkConf().setAppName(“app”) //When using the client to submit tasks to the cluster, the matser will be overwritten by the local sparkConf.set("spark.master", "local[*]"){code} so we have to do like this {code:java} val sparkConf = new SparkConf().setAppName(“app”) //Because the program runs to set the priority of the master, we have to first determine whether to set the master to avoid submitting the cluster to run. sparkConf.set("spark.master",sparkConf.get("spark.master","local[*]")){code} so is spark.app.name Is it better for users to handle it like submit client ? was: {code:java} //代码占位符 {code} I see the default value of master setting in spark-submit client ```scala // Global defaults. These should be keep to minimum to avoid confusing behavior. master = Option(master).getOrElse("local[*]") ``` but during our development and debugging, We will encounter this kind of problem Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration This conflicts with the default setting ```scala //If we do val sparkConf = new SparkConf().setAppName(“app”) //When using the client to submit tasks to the cluster, the matser will be overwritten by the local sparkConf.set("spark.master", "local[*]") ``` so we have to do like this ```scala val sparkConf = new SparkConf().setAppName(“app”) //Because the program runs to set the priority of the master, we have to first determine whether to set the master to avoid submitting the cluster to run. sparkConf.set("spark.master",sparkConf.get("spark.master","local[*]")) ``` so is spark.app.name Is it better for users to handle it like submit client ? > set spark.master and spark.app.name conf default value > -- > > Key: SPARK-31193 > URL: https://issues.apache.org/jira/browse/SPARK-31193 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0, 2.3.3, 2.4.0, 2.4.2, 2.4.3, 2.4.4, 2.4.5, 3.1.0 >Reporter: daile >Priority: Major > Fix For: 3.1.0 > > > I see the default value of master setting in spark-submit client > {code:java} > // Global defaults. These should be keep to minimum to avoid confusing > behavior. master = Option(master).getOrElse("local[*]") > {code} > but during our development and debugging, We will encounter this kind of > problem > Exception in thread "main" org.apache.spark.SparkException: A master URL must > be set in your configuration > This conflicts with the default setting > > {code:java} > //If we do > val sparkConf = new SparkConf().setAppName(“app”) > //When using the client to submit tasks to the cluster, the matser will be > overwritten by the local > sparkConf.set("spark.master", "local[*]"){code} > > so we have to do like this > {code:java} > val sparkConf = new SparkConf().setAppName(“app”) > //Because the program runs to set the priority of the master, we have to > first determine whether to set the master to avoid submitting the cluster to > run. > sparkConf.set("spark.master",sparkConf.get("spark.master","local[*]")){code} > > > so is spark.app.name > Is it better for users to handle it like submit client ? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31457) spark jdbc read hive created the wrong PreparedStatement
daile created SPARK-31457: - Summary: spark jdbc read hive created the wrong PreparedStatement Key: SPARK-31457 URL: https://issues.apache.org/jira/browse/SPARK-31457 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.2, 3.1.0 Environment: spark 2.3.2 hive 2.1.1 Reporter: daile {code:java} val res = spark .read .format("jdbc") .option("url", "jdbc:hive2://host:1/default") .option("dbtable", "user_info2") .option("driver","org.apache.hive.jdbc.HiveDriver") .option("user", "") .option("password","") .load() res.show(){code} get wrong result +--+--+---+ |user_info2.age|user_info2.sex|user_info2.birthday| +--+--+---+ |user_info2.age|user_info2.sex|user_info2.birthday| |user_info2.age|user_info2.sex|user_info2.birthday| |user_info2.age|user_info2.sex|user_info2.birthday| |user_info2.age|user_info2.sex|user_info2.birthday| |user_info2.age|user_info2.sex|user_info2.birthday| |user_info2.age|user_info2.sex|user_info2.birthday| |user_info2.age|user_info2.sex|user_info2.birthday| |user_info2.age|user_info2.sex|user_info2.birthday| |user_info2.age|user_info2.sex|user_info2.birthday| |user_info2.age|user_info2.sex|user_info2.birthday| +--+--+---+ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31457) spark jdbc read hive created the wrong PreparedStatement
[ https://issues.apache.org/jira/browse/SPARK-31457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] daile updated SPARK-31457: -- Attachment: hivejdbc3.png > spark jdbc read hive created the wrong PreparedStatement > > > Key: SPARK-31457 > URL: https://issues.apache.org/jira/browse/SPARK-31457 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2, 3.1.0 > Environment: spark 2.3.2 > hive 2.1.1 > >Reporter: daile >Priority: Major > Attachments: hivejdbc2.png, hivejdbc3.png, sparkhivejdbc.png > > > {code:java} > val res = spark > .read > .format("jdbc") > .option("url", "jdbc:hive2://host:1/default") > .option("dbtable", "user_info2") > .option("driver","org.apache.hive.jdbc.HiveDriver") > .option("user", "") > .option("password","") > .load() > res.show(){code} > get wrong result > +--+--+---+ > |user_info2.age|user_info2.sex|user_info2.birthday| > +--+--+---+ > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > +--+--+---+ > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31457) spark jdbc read hive created the wrong PreparedStatement
[ https://issues.apache.org/jira/browse/SPARK-31457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] daile updated SPARK-31457: -- Attachment: hivejdbc2.png > spark jdbc read hive created the wrong PreparedStatement > > > Key: SPARK-31457 > URL: https://issues.apache.org/jira/browse/SPARK-31457 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2, 3.1.0 > Environment: spark 2.3.2 > hive 2.1.1 > >Reporter: daile >Priority: Major > Attachments: hivejdbc2.png, hivejdbc3.png, sparkhivejdbc.png > > > {code:java} > val res = spark > .read > .format("jdbc") > .option("url", "jdbc:hive2://host:1/default") > .option("dbtable", "user_info2") > .option("driver","org.apache.hive.jdbc.HiveDriver") > .option("user", "") > .option("password","") > .load() > res.show(){code} > get wrong result > +--+--+---+ > |user_info2.age|user_info2.sex|user_info2.birthday| > +--+--+---+ > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > +--+--+---+ > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31457) spark jdbc read hive created the wrong PreparedStatement
[ https://issues.apache.org/jira/browse/SPARK-31457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] daile updated SPARK-31457: -- Attachment: sparkhivejdbc.png > spark jdbc read hive created the wrong PreparedStatement > > > Key: SPARK-31457 > URL: https://issues.apache.org/jira/browse/SPARK-31457 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2, 3.1.0 > Environment: spark 2.3.2 > hive 2.1.1 > >Reporter: daile >Priority: Major > Attachments: hivejdbc2.png, hivejdbc3.png, sparkhivejdbc.png > > > {code:java} > val res = spark > .read > .format("jdbc") > .option("url", "jdbc:hive2://host:1/default") > .option("dbtable", "user_info2") > .option("driver","org.apache.hive.jdbc.HiveDriver") > .option("user", "") > .option("password","") > .load() > res.show(){code} > get wrong result > +--+--+---+ > |user_info2.age|user_info2.sex|user_info2.birthday| > +--+--+---+ > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > |user_info2.age|user_info2.sex|user_info2.birthday| > +--+--+---+ > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36422) Examples can't run in IDE directly
[ https://issues.apache.org/jira/browse/SPARK-36422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601181#comment-17601181 ] daile commented on SPARK-36422: --- you can try set idea run option {code:java} Include dependencies with "Provided" scope. {code} > Examples can't run in IDE directly > -- > > Key: SPARK-36422 > URL: https://issues.apache.org/jira/browse/SPARK-36422 > Project: Spark > Issue Type: Bug > Components: Examples >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > > I found the examples can't run in IDE(such as Intellij). > For example, if run `org.apache.spark.examples.sql.JavaUserDefinedScalar` in > IDE, the error message as follows: > {code:java} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/spark/sql/SparkSession > at > org.apache.spark.examples.sql.JavaUserDefinedScalar.main(JavaUserDefinedScalar.java:33) > Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:419) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:352) > ... 1 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40099) Merge adjacent CaseWhen branches if their values are the same
[ https://issues.apache.org/jira/browse/SPARK-40099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17686249#comment-17686249 ] daile commented on SPARK-40099: --- [~yumwang] can you help review it again ? > Merge adjacent CaseWhen branches if their values are the same > - > > Key: SPARK-40099 > URL: https://issues.apache.org/jira/browse/SPARK-40099 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Priority: Major > > For example: > {code:sql} > CASE > WHEN f1.buyer_id IS NOT NULL THEN 1 > WHEN f2.buyer_id IS NOT NULL THEN 1 > ELSE 0 > END > {code} > The excepted result: > {code:sql} > CASE > WHEN f1.buyer_id IS NOT NULL or f2.buyer_id IS NOT NULL > THEN 1 > ELSE 0 > END > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org