[jira] [Commented] (SPARK-25078) Standalone does not work with spark.authenticate.secret and deploy-mode=cluster
[ https://issues.apache.org/jira/browse/SPARK-25078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581781#comment-16581781 ] Bo Meng commented on SPARK-25078: - what is your suggestion to improve this? > Standalone does not work with spark.authenticate.secret and > deploy-mode=cluster > --- > > Key: SPARK-25078 > URL: https://issues.apache.org/jira/browse/SPARK-25078 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > > When running a spark standalone cluster with spark.authenticate.secret setup, > you cannot submit a program in cluster mode, even with the right secret. The > driver fails with: > {noformat} > 18/08/09 08:17:21 INFO SecurityManager: SecurityManager: authentication > enabled; ui acls disabled; users with view permissions: Set(systest); groups > with view permissions: Set(); users with modify permissions: Set(systest); > groups with modify permissions: Set() > 18/08/09 08:17:21 ERROR SparkContext: Error initializing SparkContext. > java.lang.IllegalArgumentException: requirement failed: A secret key must be > specified via the spark.authenticate.secret config. > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.SecurityManager.initializeAuth(SecurityManager.scala:361) > at org.apache.spark.SparkEnv$.create(SparkEnv.scala:238) > at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:175) > at > org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:257) > at org.apache.spark.SparkContext.(SparkContext.scala:424) > ... > {noformat} > but its actually doing the wrong check in > {{SecurityManager.initializeAuth()}}. The secret is there, its just in an > environment variable {{_SPARK_AUTH_SECRET}} (so its not visible to another > process). > *Workaround*: In your program, you can pass in a dummy secret to your spark > conf. It doesn't matter what it is at all, later it'll be ignored and when > establishing connections, the secret from the env variable will be used. Eg. > {noformat} > val conf = new SparkConf() > conf.setIfMissing("spark.authenticate.secret", "doesn't matter") > val sc = new SparkContext(conf) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22357) SparkContext.binaryFiles ignore minPartitions parameter
[ https://issues.apache.org/jira/browse/SPARK-22357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221078#comment-16221078 ] Bo Meng commented on SPARK-22357: - a quick fix could be as follows, correct me if i am wrong. val defaultParallelism = Math.max(sc.defaultParallelism, minPartitions) > SparkContext.binaryFiles ignore minPartitions parameter > --- > > Key: SPARK-22357 > URL: https://issues.apache.org/jira/browse/SPARK-22357 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.2, 2.2.0 >Reporter: Weichen Xu > > this is a bug in binaryFiles - even though we give it the partitions, > binaryFiles ignores it. > This is a bug introduced in spark 2.1 from spark 2.0, in file > PortableDataStream.scala the argument “minPartitions” is no longer used (with > the push to master on 11/7/6): > {code} > /** > Allow minPartitions set by end-user in order to keep compatibility with old > Hadoop API > which is set through setMaxSplitSize > */ > def setMinPartitions(sc: SparkContext, context: JobContext, minPartitions: > Int) { > val defaultMaxSplitBytes = > sc.getConf.get(config.FILES_MAX_PARTITION_BYTES) > val openCostInBytes = sc.getConf.get(config.FILES_OPEN_COST_IN_BYTES) > val defaultParallelism = sc.defaultParallelism > val files = listStatus(context).asScala > val totalBytes = files.filterNot(.isDirectory).map(.getLen + > openCostInBytes).sum > val bytesPerCore = totalBytes / defaultParallelism > val maxSplitSize = Math.min(defaultMaxSplitBytes, > Math.max(openCostInBytes, bytesPerCore)) > super.setMaxSplitSize(maxSplitSize) > } > {code} > The code previously, in version 2.0, was: > {code} > def setMinPartitions(context: JobContext, minPartitions: Int) { > val totalLen = > listStatus(context).asScala.filterNot(.isDirectory).map(.getLen).sum > val maxSplitSize = math.ceil(totalLen / math.max(minPartitions, > 1.0)).toLong > super.setMaxSplitSize(maxSplitSize) > } > {code} > The new code is very smart, but it ignores what the user passes in and uses > the data size, which is kind of a breaking change in some sense > In our specific case this was a problem, because we initially read in just > the files names and only after that the dataframe becomes very large, when > reading in the images themselves – and in this case the new code does not > handle the partitioning very well. > I’m not sure if it can be easily fixed because I don’t understand the full > context of the change in spark (but at the very least the unused parameter > should be removed to avoid confusion). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20145) "SELECT * FROM range(1)" works, but "SELECT * FROM RANGE(1)" doesn't
[ https://issues.apache.org/jira/browse/SPARK-20145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949588#comment-15949588 ] Bo Meng commented on SPARK-20145: - You do not need to be assigned. Just go ahead provide your solution in PR. > "SELECT * FROM range(1)" works, but "SELECT * FROM RANGE(1)" doesn't > > > Key: SPARK-20145 > URL: https://issues.apache.org/jira/browse/SPARK-20145 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Juliusz Sompolski > > Executed at clean tip of the master branch, with all default settings: > scala> spark.sql("SELECT * FROM range(1)") > res1: org.apache.spark.sql.DataFrame = [id: bigint] > scala> spark.sql("SELECT * FROM RANGE(1)") > org.apache.spark.sql.AnalysisException: could not resolve `RANGE` to a > table-valued function; line 1 pos 14 > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.ResolveTableValuedFunctions$$anonfun$apply$1.applyOrElse(ResolveTableValuedFunctions.scala:126) > at > org.apache.spark.sql.catalyst.analysis.ResolveTableValuedFunctions$$anonfun$apply$1.applyOrElse(ResolveTableValuedFunctions.scala:106) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62) > ... > I believe it should be case insensitive? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20145) "SELECT * FROM range(1)" works, but "SELECT * FROM RANGE(1)" doesn't
[ https://issues.apache.org/jira/browse/SPARK-20145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947726#comment-15947726 ] Bo Meng commented on SPARK-20145: - >From the current code, I can see builtinFunctions is using the exact match for >looking up ("range" as a key is all lowercase). > "SELECT * FROM range(1)" works, but "SELECT * FROM RANGE(1)" doesn't > > > Key: SPARK-20145 > URL: https://issues.apache.org/jira/browse/SPARK-20145 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Juliusz Sompolski > > Executed at clean tip of the master branch, with all default settings: > scala> spark.sql("SELECT * FROM range(1)") > res1: org.apache.spark.sql.DataFrame = [id: bigint] > scala> spark.sql("SELECT * FROM RANGE(1)") > org.apache.spark.sql.AnalysisException: could not resolve `RANGE` to a > table-valued function; line 1 pos 14 > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.ResolveTableValuedFunctions$$anonfun$apply$1.applyOrElse(ResolveTableValuedFunctions.scala:126) > at > org.apache.spark.sql.catalyst.analysis.ResolveTableValuedFunctions$$anonfun$apply$1.applyOrElse(ResolveTableValuedFunctions.scala:106) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62) > ... > I believe it should be case insensitive? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20146) Column comment information is missing for Thrift Server's TableSchema
Bo Meng created SPARK-20146: --- Summary: Column comment information is missing for Thrift Server's TableSchema Key: SPARK-20146 URL: https://issues.apache.org/jira/browse/SPARK-20146 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.2.0 Reporter: Bo Meng Priority: Minor I found this issue while doing some tests against Thrift Server. The column comments information were missing while querying the TableSchema. Currently, all the comments were ignored. I will post a fix shortly. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20004) Spark thrift server ovewrites spark.app.name
[ https://issues.apache.org/jira/browse/SPARK-20004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935472#comment-15935472 ] Bo Meng edited comment on SPARK-20004 at 3/21/17 10:32 PM: --- I think you can still use --name for your app name. for example, /spark/sbin/start-thriftserver.sh --name="My server 1" was (Author: bomeng): I think you can still use --name for your app name. for example, /spark/sbin/start-thriftserver.sh --conf spark.yarn.queue=spark.client.$host --conf spark.app.name="ODBC server $host" > Spark thrift server ovewrites spark.app.name > > > Key: SPARK-20004 > URL: https://issues.apache.org/jira/browse/SPARK-20004 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Egor Pahomov >Priority: Minor > > {code} > export SPARK_YARN_APP_NAME="ODBC server $host" > /spark/sbin/start-thriftserver.sh --conf spark.yarn.queue=spark.client.$host > --conf spark.app.name="ODBC server $host" > {code} > And spark-defaults.conf contains: > {code} > spark.app.name "ODBC server spark01" > {code} > Still name in yarn is "Thrift JDBC/ODBC Server" -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20004) Spark thrift server ovewrites spark.app.name
[ https://issues.apache.org/jira/browse/SPARK-20004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935472#comment-15935472 ] Bo Meng commented on SPARK-20004: - I think you can still use --name for your app name. for example, /spark/sbin/start-thriftserver.sh --conf spark.yarn.queue=spark.client.$host --conf spark.app.name="ODBC server $host" > Spark thrift server ovewrites spark.app.name > > > Key: SPARK-20004 > URL: https://issues.apache.org/jira/browse/SPARK-20004 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Egor Pahomov >Priority: Minor > > {code} > export SPARK_YARN_APP_NAME="ODBC server $host" > /spark/sbin/start-thriftserver.sh --conf spark.yarn.queue=spark.client.$host > --conf spark.app.name="ODBC server $host" > {code} > And spark-defaults.conf contains: > {code} > spark.app.name "ODBC server spark01" > {code} > Still name in yarn is "Thrift JDBC/ODBC Server" -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16173) Can't join describe() of DataFrame in Scala 2.10
[ https://issues.apache.org/jira/browse/SPARK-16173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347255#comment-15347255 ] Bo Meng edited comment on SPARK-16173 at 6/23/16 9:58 PM: -- Use the latest master, I was not able to reproduce, here is my code: {quote} val a = Seq(("Alice", 1)).toDF("name", "age").describe() val b = Seq(("Bob", 2)).toDF("name", "grade").describe() a.show() b.show() a.join(b, Seq("summary")).show() {quote} Anything I am missing? I am using Scala 2.11, does it only happen to Scala 2.10? was (Author: bomeng): Use the latest master, I was not able to reproduce, here is my code: {quote} val a = Seq(("Alice", 1)).toDF("name", "age").describe() val b = Seq(("Bob", 2)).toDF("name", "grade").describe() a.show() b.show() a.join(b, Seq("summary")).show() {quote} Anything I am missing? > Can't join describe() of DataFrame in Scala 2.10 > > > Key: SPARK-16173 > URL: https://issues.apache.org/jira/browse/SPARK-16173 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.2, 1.6.1, 2.0.0 >Reporter: Davies Liu > > descripbe() of DataFrame use Seq() (it's a Iterator actually) to create > another DataFrame, which can not be serialized in Scala 2.10. > {code} > org.apache.spark.SparkException: Task not serializable > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304) > at > org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) > at org.apache.spark.SparkContext.clean(SparkContext.scala:2060) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) > at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) > at > org.apache.spark.sql.execution.ConvertToUnsafe.doExecute(rowFormatConverters.scala:38) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashJoin.scala:82) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashJoin.scala:79) > at > org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:100) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashJoin.scala:79) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashJoin.scala:79) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.NotSerializableException: > scala.collection.Iterator$$anon$11 > Serialization stack: > - object not serializable (class: scala.collection.Iterator$$anon$11, > value: empty iterator) > - field (class: scala.collection.Iterator$$anonfun$toStream$1, name: > $outer, type: interface scala.collection.Iterator) > - object (class scala.collection.Iterator$$anonfun$toStream$1, > ) > - field (class: scala.collection.immutable.Stream$Cons, name: tl, type: > interface scala.Function0) > - object (class scala.collection.immutable.Stream$Cons, > Stream(WrappedArray(1), WrappedArray(2.0), WrappedArray(NaN), > WrappedArray(2), WrappedArray(2))) > - field (class: scala.collection.immutable.Stream$$anonfun$zip$1, name: > $outer, type: class scala.collection.immutable.Stream) > - object (class scala.collection.immutable.Stream$$anonfun$zip$1, > ) > - field (class: scala.collection.immutable.Stream$Cons, name: tl, type: > interface scala.Function0) > - object (class
[jira] [Updated] (SPARK-16004) Improve CatalogTable information
[ https://issues.apache.org/jira/browse/SPARK-16004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-16004: Description: A few issues found when running "describe extended | formatted [tableName]" command: 1. The last access time is incorrectly displayed something like "Last Access Time: |Wed Dec 31 15:59:59 PST 1969", I think we should display as "UNKNOWN" as Hive does; 2. Comments fields display "null" instead of empty string when commend is None; I will make a PR shortly. was: A few issues found when running "describe extended | formatted [tableName]" command: 1. The last access time is incorrectly displayed something like "Last Access Time: |Wed Dec 31 15:59:59 PST 1969", I think we should display as "UNKNOWN" as Hive does; 2. Owner is always empty, instead of the current login user, who creates the table; 3. Comments fields display "null" instead of empty string when commend is None; I will make a PR shortly. > Improve CatalogTable information > > > Key: SPARK-16004 > URL: https://issues.apache.org/jira/browse/SPARK-16004 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng > > A few issues found when running "describe extended | formatted [tableName]" > command: > 1. The last access time is incorrectly displayed something like "Last Access > Time: |Wed Dec 31 15:59:59 PST 1969", I think we should display as > "UNKNOWN" as Hive does; > 2. Comments fields display "null" instead of empty string when commend is > None; > I will make a PR shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16173) Can't join describe() of DataFrame in Scala 2.10
[ https://issues.apache.org/jira/browse/SPARK-16173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347255#comment-15347255 ] Bo Meng edited comment on SPARK-16173 at 6/23/16 9:38 PM: -- Use the latest master, I was not able to reproduce, here is my code: {quote} val a = Seq(("Alice", 1)).toDF("name", "age").describe() val b = Seq(("Bob", 2)).toDF("name", "grade").describe() a.show() b.show() a.join(b, Seq("summary")).show() {quote} Anything I am missing? was (Author: bomeng): Use the latest master, I was not able to reproduce, here is my code: {{ val a = Seq(("Alice", 1)).toDF("name", "age").describe() val b = Seq(("Bob", 2)).toDF("name", "grade").describe() a.show() b.show() a.join(b, Seq("summary")).show() }} Anything I am missing? > Can't join describe() of DataFrame in Scala 2.10 > > > Key: SPARK-16173 > URL: https://issues.apache.org/jira/browse/SPARK-16173 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.2, 1.6.1, 2.0.0 >Reporter: Davies Liu > > descripbe() of DataFrame use Seq() (it's a Iterator actually) to create > another DataFrame, which can not be serialized in Scala 2.10. > {code} > org.apache.spark.SparkException: Task not serializable > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304) > at > org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) > at org.apache.spark.SparkContext.clean(SparkContext.scala:2060) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) > at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) > at > org.apache.spark.sql.execution.ConvertToUnsafe.doExecute(rowFormatConverters.scala:38) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashJoin.scala:82) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashJoin.scala:79) > at > org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:100) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashJoin.scala:79) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashJoin.scala:79) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.NotSerializableException: > scala.collection.Iterator$$anon$11 > Serialization stack: > - object not serializable (class: scala.collection.Iterator$$anon$11, > value: empty iterator) > - field (class: scala.collection.Iterator$$anonfun$toStream$1, name: > $outer, type: interface scala.collection.Iterator) > - object (class scala.collection.Iterator$$anonfun$toStream$1, > ) > - field (class: scala.collection.immutable.Stream$Cons, name: tl, type: > interface scala.Function0) > - object (class scala.collection.immutable.Stream$Cons, > Stream(WrappedArray(1), WrappedArray(2.0), WrappedArray(NaN), > WrappedArray(2), WrappedArray(2))) > - field (class: scala.collection.immutable.Stream$$anonfun$zip$1, name: > $outer, type: class scala.collection.immutable.Stream) > - object (class scala.collection.immutable.Stream$$anonfun$zip$1, > ) > - field (class: scala.collection.immutable.Stream$Cons, name: tl, type: > interface scala.Function0) > - object (class scala.collection.immutable.Stream$Cons, > Stream((WrappedArray(1),(count,)),
[jira] [Comment Edited] (SPARK-16173) Can't join describe() of DataFrame in Scala 2.10
[ https://issues.apache.org/jira/browse/SPARK-16173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347255#comment-15347255 ] Bo Meng edited comment on SPARK-16173 at 6/23/16 9:37 PM: -- Use the latest master, I was not able to reproduce, here is my code: {{ val a = Seq(("Alice", 1)).toDF("name", "age").describe() val b = Seq(("Bob", 2)).toDF("name", "grade").describe() a.show() b.show() a.join(b, Seq("summary")).show() }} Anything I am missing? was (Author: bomeng): Use the latest master, I was not able to reproduce, here is my code: val a = Seq(("Alice", 1)).toDF("name", "age").describe() val b = Seq(("Bob", 2)).toDF("name", "grade").describe() a.show() b.show() a.join(b, Seq("summary")).show() Anything I am missing? > Can't join describe() of DataFrame in Scala 2.10 > > > Key: SPARK-16173 > URL: https://issues.apache.org/jira/browse/SPARK-16173 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.2, 1.6.1, 2.0.0 >Reporter: Davies Liu > > descripbe() of DataFrame use Seq() (it's a Iterator actually) to create > another DataFrame, which can not be serialized in Scala 2.10. > {code} > org.apache.spark.SparkException: Task not serializable > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304) > at > org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) > at org.apache.spark.SparkContext.clean(SparkContext.scala:2060) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) > at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) > at > org.apache.spark.sql.execution.ConvertToUnsafe.doExecute(rowFormatConverters.scala:38) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashJoin.scala:82) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashJoin.scala:79) > at > org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:100) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashJoin.scala:79) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashJoin.scala:79) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.NotSerializableException: > scala.collection.Iterator$$anon$11 > Serialization stack: > - object not serializable (class: scala.collection.Iterator$$anon$11, > value: empty iterator) > - field (class: scala.collection.Iterator$$anonfun$toStream$1, name: > $outer, type: interface scala.collection.Iterator) > - object (class scala.collection.Iterator$$anonfun$toStream$1, > ) > - field (class: scala.collection.immutable.Stream$Cons, name: tl, type: > interface scala.Function0) > - object (class scala.collection.immutable.Stream$Cons, > Stream(WrappedArray(1), WrappedArray(2.0), WrappedArray(NaN), > WrappedArray(2), WrappedArray(2))) > - field (class: scala.collection.immutable.Stream$$anonfun$zip$1, name: > $outer, type: class scala.collection.immutable.Stream) > - object (class scala.collection.immutable.Stream$$anonfun$zip$1, > ) > - field (class: scala.collection.immutable.Stream$Cons, name: tl, type: > interface scala.Function0) > - object (class scala.collection.immutable.Stream$Cons, > Stream((WrappedArray(1),(count,)), >
[jira] [Commented] (SPARK-16173) Can't join describe() of DataFrame in Scala 2.10
[ https://issues.apache.org/jira/browse/SPARK-16173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347255#comment-15347255 ] Bo Meng commented on SPARK-16173: - Use the latest master, I was not able to reproduce, here is my code: val a = Seq(("Alice", 1)).toDF("name", "age").describe() val b = Seq(("Bob", 2)).toDF("name", "grade").describe() a.show() b.show() a.join(b, Seq("summary")).show() Anything I am missing? > Can't join describe() of DataFrame in Scala 2.10 > > > Key: SPARK-16173 > URL: https://issues.apache.org/jira/browse/SPARK-16173 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.2, 1.6.1, 2.0.0 >Reporter: Davies Liu > > descripbe() of DataFrame use Seq() (it's a Iterator actually) to create > another DataFrame, which can not be serialized in Scala 2.10. > {code} > org.apache.spark.SparkException: Task not serializable > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304) > at > org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) > at org.apache.spark.SparkContext.clean(SparkContext.scala:2060) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) > at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) > at > org.apache.spark.sql.execution.ConvertToUnsafe.doExecute(rowFormatConverters.scala:38) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashJoin.scala:82) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashJoin.scala:79) > at > org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:100) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashJoin.scala:79) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashJoin.scala:79) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.NotSerializableException: > scala.collection.Iterator$$anon$11 > Serialization stack: > - object not serializable (class: scala.collection.Iterator$$anon$11, > value: empty iterator) > - field (class: scala.collection.Iterator$$anonfun$toStream$1, name: > $outer, type: interface scala.collection.Iterator) > - object (class scala.collection.Iterator$$anonfun$toStream$1, > ) > - field (class: scala.collection.immutable.Stream$Cons, name: tl, type: > interface scala.Function0) > - object (class scala.collection.immutable.Stream$Cons, > Stream(WrappedArray(1), WrappedArray(2.0), WrappedArray(NaN), > WrappedArray(2), WrappedArray(2))) > - field (class: scala.collection.immutable.Stream$$anonfun$zip$1, name: > $outer, type: class scala.collection.immutable.Stream) > - object (class scala.collection.immutable.Stream$$anonfun$zip$1, > ) > - field (class: scala.collection.immutable.Stream$Cons, name: tl, type: > interface scala.Function0) > - object (class scala.collection.immutable.Stream$Cons, > Stream((WrappedArray(1),(count,)), > (WrappedArray(2.0),(mean,)), > (WrappedArray(NaN),(stddev,)), > (WrappedArray(2),(min,)), (WrappedArray(2),(max, > - field (class: scala.collection.immutable.Stream$$anonfun$map$1, name: > $outer, type: class scala.collection.immutable.Stream) > - object (class scala.collection.immutable.Stream$$anonfun$map$1, > ) > - field (class:
[jira] [Commented] (SPARK-15230) Back quoted column with dot in it fails when running distinct on dataframe
[ https://issues.apache.org/jira/browse/SPARK-15230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346981#comment-15346981 ] Bo Meng commented on SPARK-15230: - Can anyone update the "Component/s" of this JIRA? It should belongs to "SQL" not "Examples". Thanks. > Back quoted column with dot in it fails when running distinct on dataframe > -- > > Key: SPARK-15230 > URL: https://issues.apache.org/jira/browse/SPARK-15230 > Project: Spark > Issue Type: Bug > Components: Examples >Affects Versions: 1.6.0 >Reporter: Barry Becker >Assignee: Bo Meng > Fix For: 2.0.1 > > > When working with a dataframe columns with .'s in them must be backquoted > (``) or the column name will not be found. This works for most dataframe > methods, but I discovered that it does not work for distinct(). > Suppose you have a dataFrame, testDf, with a DoubleType column named > {{pos.NoZero}}. This statememt: > {noformat} > testDf.select(new Column("`pos.NoZero`")).distinct().collect().mkString(", ") > {noformat} > will fail with this error: > {noformat} > org.apache.spark.sql.AnalysisException: Cannot resolve column name > "pos.NoZero" among (pos.NoZero); > at > org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:152) > at > org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:152) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:151) > at > org.apache.spark.sql.DataFrame$$anonfun$dropDuplicates$1$$anonfun$40.apply(DataFrame.scala:1329) > at > org.apache.spark.sql.DataFrame$$anonfun$dropDuplicates$1$$anonfun$40.apply(DataFrame.scala:1329) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.DataFrame$$anonfun$dropDuplicates$1.apply(DataFrame.scala:1329) > at > org.apache.spark.sql.DataFrame$$anonfun$dropDuplicates$1.apply(DataFrame.scala:1328) > at > org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$withPlan(DataFrame.scala:2165) > at org.apache.spark.sql.DataFrame.dropDuplicates(DataFrame.scala:1328) > at org.apache.spark.sql.DataFrame.dropDuplicates(DataFrame.scala:1348) > at org.apache.spark.sql.DataFrame.dropDuplicates(DataFrame.scala:1319) > at org.apache.spark.sql.DataFrame.distinct(DataFrame.scala:1612) > at > com.mineset.spark.vizagg.selection.SelectionExpressionSuite$$anonfun$40.apply$mcV$sp(SelectionExpressionSuite.scala:317) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16084) Minor javadoc issue with "Describe" table in the parser
[ https://issues.apache.org/jira/browse/SPARK-16084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341177#comment-15341177 ] Bo Meng commented on SPARK-16084: - Got it ! > Minor javadoc issue with "Describe" table in the parser > --- > > Key: SPARK-16084 > URL: https://issues.apache.org/jira/browse/SPARK-16084 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Bo Meng >Priority: Trivial > > The comments need a minor change - we now support FORMATTED with DESCRIBE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16084) Minor javadoc issue with "Describe" table in the parser
[ https://issues.apache.org/jira/browse/SPARK-16084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341167#comment-15341167 ] Bo Meng commented on SPARK-16084: - Ok! I will do next time by sending to the mailing list if it is a doc change. > Minor javadoc issue with "Describe" table in the parser > --- > > Key: SPARK-16084 > URL: https://issues.apache.org/jira/browse/SPARK-16084 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Bo Meng >Priority: Trivial > > The comments need a minor change - we now support FORMATTED with DESCRIBE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16084) Minor javadoc issue with "Describe" table in the parser
[ https://issues.apache.org/jira/browse/SPARK-16084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341141#comment-15341141 ] Bo Meng commented on SPARK-16084: - [~sowen] Please let me know how to handle it next time if I am not creating a JIRA. Thanks. > Minor javadoc issue with "Describe" table in the parser > --- > > Key: SPARK-16084 > URL: https://issues.apache.org/jira/browse/SPARK-16084 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Bo Meng >Priority: Trivial > > The comments need a minor change - we now support FORMATTED with DESCRIBE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16084) Minor javadoc issue with "Describe" table in the parser
[ https://issues.apache.org/jira/browse/SPARK-16084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-16084: Summary: Minor javadoc issue with "Describe" table in the parser (was: Minor issue with "Describe" table in the parser) > Minor javadoc issue with "Describe" table in the parser > --- > > Key: SPARK-16084 > URL: https://issues.apache.org/jira/browse/SPARK-16084 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng >Priority: Minor > > The comments need a minor change - we now support FORMATTED with DESCRIBE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16084) Minor issue with "Describe" table in the parser
Bo Meng created SPARK-16084: --- Summary: Minor issue with "Describe" table in the parser Key: SPARK-16084 URL: https://issues.apache.org/jira/browse/SPARK-16084 Project: Spark Issue Type: Bug Components: SQL Reporter: Bo Meng Priority: Minor The comments need a minor change - we now support FORMATTED with DESCRIBE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16004) Improve CatalogTable information
Bo Meng created SPARK-16004: --- Summary: Improve CatalogTable information Key: SPARK-16004 URL: https://issues.apache.org/jira/browse/SPARK-16004 Project: Spark Issue Type: Bug Components: SQL Reporter: Bo Meng A few issues found when running "describe extended | formatted [tableName]" command: 1. The last access time is incorrectly displayed something like "Last Access Time: |Wed Dec 31 15:59:59 PST 1969", I think we should display as "UNKNOWN" as Hive does; 2. Owner is always empty, instead of the current login user, who creates the table; 3. Comments fields display "null" instead of empty string when commend is None; I will make a PR shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15978) Some improvement of "Show Tables"
Bo Meng created SPARK-15978: --- Summary: Some improvement of "Show Tables" Key: SPARK-15978 URL: https://issues.apache.org/jira/browse/SPARK-15978 Project: Spark Issue Type: Bug Components: SQL Reporter: Bo Meng Priority: Minor I've found some minor issues in "show tables" command: 1. In the SessionCatalog.scala, listTables(db: String) method will call listTables(formatDatabaseName(db), "*") to list all the tables for certain db, but in the method listTables(db: String, pattern: String), this db name is formatted once more. So I think we should remove formatDatabaseName() in the caller. 2. I suggest to add sort to listTables(db: String) in InMemoryCatalog.scala, just like listDatabases(). I will make a PR shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15952) "show databases" does not get sorted result
Bo Meng created SPARK-15952: --- Summary: "show databases" does not get sorted result Key: SPARK-15952 URL: https://issues.apache.org/jira/browse/SPARK-15952 Project: Spark Issue Type: Bug Components: SQL Reporter: Bo Meng Two issues I've found for "show databases" commands: 1. The returned database name list was not sorted, it only works when "like" was used together; (HIVE will always return a sorted list) 2. When it is used as sql("show databases").show, it will output a table with column named as "result", but for sql("show tables").show, it will output the column name as "tableName", so I think we should be consistent and use "databaseName" at least. I will make a PR shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13268) SQL Timestamp stored as GMT but toString returns GMT-08:00
[ https://issues.apache.org/jira/browse/SPARK-13268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323442#comment-15323442 ] Bo Meng commented on SPARK-13268: - Why is this related to Spark? The conversion does not use any Spark function and I think the conversion loses the time zone information along the way. > SQL Timestamp stored as GMT but toString returns GMT-08:00 > -- > > Key: SPARK-13268 > URL: https://issues.apache.org/jira/browse/SPARK-13268 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 >Reporter: Ilya Ganelin > > There is an issue with how timestamps are displayed/converted to Strings in > Spark SQL. The documentation states that the timestamp should be created in > the GMT time zone, however, if we do so, we see that the output actually > contains a -8 hour offset: > {code} > new > Timestamp(ZonedDateTime.parse("2015-01-01T00:00:00Z[GMT]").toInstant.toEpochMilli) > res144: java.sql.Timestamp = 2014-12-31 16:00:00.0 > new > Timestamp(ZonedDateTime.parse("2015-01-01T00:00:00Z[GMT-08:00]").toInstant.toEpochMilli) > res145: java.sql.Timestamp = 2015-01-01 00:00:00.0 > {code} > This result is confusing, unintuitive, and introduces issues when converting > from DataFrames containing timestamps to RDDs which are then saved as text. > This has the effect of essentially shifting all dates in a dataset by 1 day. > The suggested fix for this is to update the timestamp toString representation > to either a) Include timezone or b) Correctly display in GMT. > This change may well introduce substantial and insidious bugs so I'm not sure > how best to resolve this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15613) Incorrect days to millis conversion
[ https://issues.apache.org/jira/browse/SPARK-15613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323386#comment-15323386 ] Bo Meng commented on SPARK-15613: - Does this only happen to 1.6? I have tried on the latest master and it does not have this issue. > Incorrect days to millis conversion > > > Key: SPARK-15613 > URL: https://issues.apache.org/jira/browse/SPARK-15613 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 > Environment: java version "1.8.0_91" >Reporter: Dmitry Bushev > > There is an issue with {{DateTimeUtils.daysToMillis}} implementation. It > affects {{DateTimeUtils.toJavaDate}} and ultimately CatalystTypeConverter, > i.e the conversion of date stored as {{Int}} days from epoch in InternalRow > to {{java.sql.Date}} of Row returned to user. > > The issue can be reproduced with this test (all the following tests are in my > defalut timezone Europe/Moscow): > {code} > scala> for (days <- 0 to 2 if millisToDays(daysToMillis(days)) != days) > yield days > res23: scala.collection.immutable.IndexedSeq[Int] = Vector(4108, 4473, 4838, > 5204, 5568, 5932, 6296, 6660, 7024, 7388, 8053, 8487, 8851, 9215, 9586, 9950, > 10314, 10678, 11042, 11406, 11777, 12141, 12505, 12869, 13233, 13597, 13968, > 14332, 14696, 15060) > {code} > For example, for {{4108}} day of epoch, the correct date should be > {{1981-04-01}} > {code} > scala> DateTimeUtils.toJavaDate(4107) > res25: java.sql.Date = 1981-03-31 > scala> DateTimeUtils.toJavaDate(4108) > res26: java.sql.Date = 1981-03-31 > scala> DateTimeUtils.toJavaDate(4109) > res27: java.sql.Date = 1981-04-02 > {code} > There was previous unsuccessful attempt to work around the problem in > SPARK-11415. It seems that issue involves flaws in java date implementation > and I don't see how it can be fixed without third-party libraries. > I was not able to identify the library of choice for Spark. The following > implementation uses [JSR-310|http://www.threeten.org/] > {code} > def millisToDays(millisUtc: Long): SQLDate = { > val instant = Instant.ofEpochMilli(millisUtc) > val zonedDateTime = instant.atZone(ZoneId.systemDefault) > zonedDateTime.toLocalDate.toEpochDay.toInt > } > def daysToMillis(days: SQLDate): Long = { > val localDate = LocalDate.ofEpochDay(days) > val zonedDateTime = localDate.atStartOfDay(ZoneId.systemDefault) > zonedDateTime.toInstant.toEpochMilli > } > {code} > that produces correct results: > {code} > scala> for (days <- 0 to 2 if millisToDays(daysToMillis(days)) != days) > yield days > res37: scala.collection.immutable.IndexedSeq[Int] = Vector() > scala> new java.sql.Date(daysToMillis(4108)) > res36: java.sql.Date = 1981-04-01 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15613) Incorrect days to millis conversion
[ https://issues.apache.org/jira/browse/SPARK-15613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323386#comment-15323386 ] Bo Meng edited comment on SPARK-15613 at 6/9/16 9:24 PM: - Does this only happen to 1.6? I have tried on the latest master and it does not have this issue. Have not tried on 1.6. was (Author: bomeng): Does this only happen to 1.6? I have tried on the latest master and it does not have this issue. > Incorrect days to millis conversion > > > Key: SPARK-15613 > URL: https://issues.apache.org/jira/browse/SPARK-15613 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 > Environment: java version "1.8.0_91" >Reporter: Dmitry Bushev > > There is an issue with {{DateTimeUtils.daysToMillis}} implementation. It > affects {{DateTimeUtils.toJavaDate}} and ultimately CatalystTypeConverter, > i.e the conversion of date stored as {{Int}} days from epoch in InternalRow > to {{java.sql.Date}} of Row returned to user. > > The issue can be reproduced with this test (all the following tests are in my > defalut timezone Europe/Moscow): > {code} > scala> for (days <- 0 to 2 if millisToDays(daysToMillis(days)) != days) > yield days > res23: scala.collection.immutable.IndexedSeq[Int] = Vector(4108, 4473, 4838, > 5204, 5568, 5932, 6296, 6660, 7024, 7388, 8053, 8487, 8851, 9215, 9586, 9950, > 10314, 10678, 11042, 11406, 11777, 12141, 12505, 12869, 13233, 13597, 13968, > 14332, 14696, 15060) > {code} > For example, for {{4108}} day of epoch, the correct date should be > {{1981-04-01}} > {code} > scala> DateTimeUtils.toJavaDate(4107) > res25: java.sql.Date = 1981-03-31 > scala> DateTimeUtils.toJavaDate(4108) > res26: java.sql.Date = 1981-03-31 > scala> DateTimeUtils.toJavaDate(4109) > res27: java.sql.Date = 1981-04-02 > {code} > There was previous unsuccessful attempt to work around the problem in > SPARK-11415. It seems that issue involves flaws in java date implementation > and I don't see how it can be fixed without third-party libraries. > I was not able to identify the library of choice for Spark. The following > implementation uses [JSR-310|http://www.threeten.org/] > {code} > def millisToDays(millisUtc: Long): SQLDate = { > val instant = Instant.ofEpochMilli(millisUtc) > val zonedDateTime = instant.atZone(ZoneId.systemDefault) > zonedDateTime.toLocalDate.toEpochDay.toInt > } > def daysToMillis(days: SQLDate): Long = { > val localDate = LocalDate.ofEpochDay(days) > val zonedDateTime = localDate.atStartOfDay(ZoneId.systemDefault) > zonedDateTime.toInstant.toEpochMilli > } > {code} > that produces correct results: > {code} > scala> for (days <- 0 to 2 if millisToDays(daysToMillis(days)) != days) > yield days > res37: scala.collection.immutable.IndexedSeq[Int] = Vector() > scala> new java.sql.Date(daysToMillis(4108)) > res36: java.sql.Date = 1981-04-01 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-14923) Support "Extended" in "Describe" table DDL
[ https://issues.apache.org/jira/browse/SPARK-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng closed SPARK-14923. --- Resolution: Fixed > Support "Extended" in "Describe" table DDL > -- > > Key: SPARK-14923 > URL: https://issues.apache.org/jira/browse/SPARK-14923 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Bo Meng > > Currently, {{Extended}} keywords in {{Describe [Extended] }} DDL > is simply ignored. This JIRA is to bring it back with the similar behavior as > Hive does. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15806) Update doc for SPARK_MASTER_IP
Bo Meng created SPARK-15806: --- Summary: Update doc for SPARK_MASTER_IP Key: SPARK-15806 URL: https://issues.apache.org/jira/browse/SPARK-15806 Project: Spark Issue Type: Bug Components: Documentation Reporter: Bo Meng Priority: Minor SPARK_MASTER_IP is a deprecated environment variable. It is replaced by SPARK_MASTER_HOST according to MasterArguments.scala. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15755) java.lang.NullPointerException when run spark 2.0 setting spark.serializer=org.apache.spark.serializer.KryoSerializer
[ https://issues.apache.org/jira/browse/SPARK-15755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318692#comment-15318692 ] Bo Meng commented on SPARK-15755: - Could you provide a test case to reproduce the issue? > java.lang.NullPointerException when run spark 2.0 setting > spark.serializer=org.apache.spark.serializer.KryoSerializer > - > > Key: SPARK-15755 > URL: https://issues.apache.org/jira/browse/SPARK-15755 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: marymwu > > java.lang.NullPointerException when run spark 2.0 setting > spark.serializer=org.apache.spark.serializer.KryoSerializer > 16/05/27 15:15:28 ERROR TaskResultGetter: Exception while getting task result > com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException > Serialization trace: > underlying (org.apache.spark.util.BoundedPriorityQueue) > at > com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793) > at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:25) > at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:19) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:312) > at > org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:87) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:66) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:56) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering.compare(GenerateOrdering.scala:157) > at > org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering.compare(GenerateOrdering.scala:148) > at scala.math.Ordering$$anon$4.compare(Ordering.scala:111) > at java.util.PriorityQueue.siftUpUsingComparator(PriorityQueue.java:649) > at java.util.PriorityQueue.siftUp(PriorityQueue.java:627) > at java.util.PriorityQueue.offer(PriorityQueue.java:329) > at java.util.PriorityQueue.add(PriorityQueue.java:306) > at > com.twitter.chill.java.PriorityQueueSerializer.read(PriorityQueueSerializer.java:78) > at > com.twitter.chill.java.PriorityQueueSerializer.read(PriorityQueueSerializer.java:31) > at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:711) > at > com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) > ... 15 more > 16/05/27 15:15:28 ERROR TaskResultGetter: Exception while getting task result > com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException > Serialization trace: > underlying (org.apache.spark.util.BoundedPriorityQueue) > at > com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793) > at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:25) > at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:19) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:312) > at > org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:87) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:66) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793) > at >
[jira] [Commented] (SPARK-15732) Dataset generated code "generated.java" Fails with Certain Case Classes
[ https://issues.apache.org/jira/browse/SPARK-15732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313230#comment-15313230 ] Bo Meng commented on SPARK-15732: - There is no easy way to work around this issue since "abstract" is a keyword in Java and Java does not allow keyword as identifier. Try to rename it is the best way I can think of. > Dataset generated code "generated.java" Fails with Certain Case Classes > --- > > Key: SPARK-15732 > URL: https://issues.apache.org/jira/browse/SPARK-15732 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: Version 2.0 Preview on the Databricks Community Edition >Reporter: Sanjay Dasgupta >Priority: Critical > > The Dataset code generation logic fails to handle field-names in case classes > that are also Java keywords (e.g. "abstract"). Scala has an escaping > mechanism (using backquotes) that allows Java (and Scala) keywords to be used > as names in programs, as in the example below: > case class PatApp(number: Int, title: String, `abstract`: String) > But this case class trips up the Dataset code generator. The following error > message is displayed when Datasets containing instances of such case classes > are processed. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in > stage 54.0 failed 1 times, most recent failure: Lost task 2.0 in stage 54.0 > (TID 1304, localhost): java.lang.RuntimeException: Error while encoding: > java.util.concurrent.ExecutionException: java.lang.Exception: failed to > compile: org.codehaus.commons.compiler.CompileException: File > 'generated.java', Line 60, Column 84: Unexpected selector 'abstract' after "." > The following code can be used to replicate the problem. This code was run on > the Databricks CE, in a Scala notebook, in 3 separate cells as shown below: > // CELL 1: > // > // Create a Case Class with "abstract" as a field-name ... > // > package keywordissue > // The field-name abstract is a Java keyword ... > case class PatApp(number: Int, title: String, `abstract`: String) > // CELL 2: > // > // Create a Dataset using the case class ... > // > import keywordissue.PatApp > val applications = List(PatApp(1001, "1001", "Abstract 1001"), PatApp(1002, > "1002", "Abstract 1002"), PatApp(1003, "1003", "Abstract for 1003"), > PatApp(/* Duplicate! */ 1003, "1004", "Abstract 1004")) > val appsDataset = sc.parallelize(applications).toDF.as[PatApp] > // CELL 3: > // > // Force Dataset code-generation. This causes the error message to display ... > // > val duplicates = appsDataset.groupByKey(_.number).mapGroups((k, i) => (k, > i.length)).filter(_._2 > 0) > duplicates.collect().foreach(println) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15737) Fix Jetty server start warning
[ https://issues.apache.org/jira/browse/SPARK-15737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-15737: Component/s: (was: SQL) Spark Core > Fix Jetty server start warning > -- > > Key: SPARK-15737 > URL: https://issues.apache.org/jira/browse/SPARK-15737 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Bo Meng >Priority: Minor > > While running any test cases, you will always see something like > "14:23:10.834 WARN org.eclipse.jetty.server.handler.AbstractHandler: No > Server set for org.eclipse.jetty.server.handler.ErrorHandler@76884e4b". -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15737) Fix Jetty server start warning
Bo Meng created SPARK-15737: --- Summary: Fix Jetty server start warning Key: SPARK-15737 URL: https://issues.apache.org/jira/browse/SPARK-15737 Project: Spark Issue Type: Improvement Components: SQL Reporter: Bo Meng Priority: Minor While running any test cases, you will always see something like "14:23:10.834 WARN org.eclipse.jetty.server.handler.AbstractHandler: No Server set for org.eclipse.jetty.server.handler.ErrorHandler@76884e4b". -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14752) LazilyGenerateOrdering throws NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-14752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312779#comment-15312779 ] Bo Meng commented on SPARK-14752: - I think this is a good approach. Thanks. > LazilyGenerateOrdering throws NullPointerException > -- > > Key: SPARK-14752 > URL: https://issues.apache.org/jira/browse/SPARK-14752 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Rajesh Balamohan > > codebase: spark master > DataSet: TPC-DS > Client: $SPARK_HOME/bin/beeline > Example query to reproduce the issue: > select i_item_id from item order by i_item_id limit 10; > Explain plan output > {noformat} > explain select i_item_id from item order by i_item_id limit 10; > +--+--+ > | > plan > > | > +--+--+ > | == Physical Plan == > TakeOrderedAndProject(limit=10, orderBy=[i_item_id#1229 ASC], > output=[i_item_id#1229]) > +- WholeStageCodegen >: +- Project [i_item_id#1229] >: +- Scan HadoopFiles[i_item_id#1229] Format: ORC, PushedFilters: [], > ReadSchema: struct | > +--+--+ > {noformat} > Exception: > {noformat} > TaskResultGetter: Exception while getting task result > com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException > Serialization trace: > underlying (org.apache.spark.util.BoundedPriorityQueue) > at > com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790) > at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:25) > at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:19) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:312) > at > org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:87) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:66) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1791) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:56) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering.compare(GenerateOrdering.scala:157) > at > org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering.compare(GenerateOrdering.scala:148) > at scala.math.Ordering$$anon$4.compare(Ordering.scala:111) > at java.util.PriorityQueue.siftUpUsingComparator(PriorityQueue.java:669) > at java.util.PriorityQueue.siftUp(PriorityQueue.java:645) > at java.util.PriorityQueue.offer(PriorityQueue.java:344) > at java.util.PriorityQueue.add(PriorityQueue.java:321) > at > com.twitter.chill.java.PriorityQueueSerializer.read(PriorityQueueSerializer.java:78) > at > com.twitter.chill.java.PriorityQueueSerializer.read(PriorityQueueSerializer.java:31) > at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708) > at >
[jira] [Created] (SPARK-15537) clean up the temp folders after finishing the tests
Bo Meng created SPARK-15537: --- Summary: clean up the temp folders after finishing the tests Key: SPARK-15537 URL: https://issues.apache.org/jira/browse/SPARK-15537 Project: Spark Issue Type: Improvement Components: SQL Reporter: Bo Meng For some of the test cases, e.g. OrcSourceSuite, it will create temp folders and temp files inside them. But after tests finish, the folders are not removed. This will cause lots of temp files created and space occupied, if we keep running the test cases. The reason is dir.delete() won't work if dir is not empty. We need to recursively delete the content before deleting the folder. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15468) fix some typos while browsing the codes
Bo Meng created SPARK-15468: --- Summary: fix some typos while browsing the codes Key: SPARK-15468 URL: https://issues.apache.org/jira/browse/SPARK-15468 Project: Spark Issue Type: Bug Components: SQL Reporter: Bo Meng Priority: Minor Found some typos while browsing the codes briefly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15230) Back quoted column with dot in it fails when running distinct on dataframe
[ https://issues.apache.org/jira/browse/SPARK-15230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285302#comment-15285302 ] Bo Meng edited comment on SPARK-15230 at 5/16/16 9:11 PM: -- In the description, {{it does not work for describe()}} should be {{it does not work for distinct()}}, please update the description, thanks. was (Author: bomeng): In the description, `it does not work for describe()` should be `it does not work for distinct()`, please update the description, thanks. > Back quoted column with dot in it fails when running distinct on dataframe > -- > > Key: SPARK-15230 > URL: https://issues.apache.org/jira/browse/SPARK-15230 > Project: Spark > Issue Type: Bug > Components: Examples >Affects Versions: 1.6.0 >Reporter: Barry Becker > > When working with a dataframe columns with .'s in them must be backquoted > (``) or the column name will not be found. This works for most dataframe > methods, but I discovered that it does not work for describe(). > Suppose you have a dataFrame, testDf, with a DoubleType column named > {{pos.NoZero}}. This statememt: > {noformat} > testDf.select(new Column("`pos.NoZero`")).distinct().collect().mkString(", ") > {noformat} > will fail with this error: > {noformat} > org.apache.spark.sql.AnalysisException: Cannot resolve column name > "pos.NoZero" among (pos.NoZero); > at > org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:152) > at > org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:152) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:151) > at > org.apache.spark.sql.DataFrame$$anonfun$dropDuplicates$1$$anonfun$40.apply(DataFrame.scala:1329) > at > org.apache.spark.sql.DataFrame$$anonfun$dropDuplicates$1$$anonfun$40.apply(DataFrame.scala:1329) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.DataFrame$$anonfun$dropDuplicates$1.apply(DataFrame.scala:1329) > at > org.apache.spark.sql.DataFrame$$anonfun$dropDuplicates$1.apply(DataFrame.scala:1328) > at > org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$withPlan(DataFrame.scala:2165) > at org.apache.spark.sql.DataFrame.dropDuplicates(DataFrame.scala:1328) > at org.apache.spark.sql.DataFrame.dropDuplicates(DataFrame.scala:1348) > at org.apache.spark.sql.DataFrame.dropDuplicates(DataFrame.scala:1319) > at org.apache.spark.sql.DataFrame.distinct(DataFrame.scala:1612) > at > com.mineset.spark.vizagg.selection.SelectionExpressionSuite$$anonfun$40.apply$mcV$sp(SelectionExpressionSuite.scala:317) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15230) Back quoted column with dot in it fails when running distinct on dataframe
[ https://issues.apache.org/jira/browse/SPARK-15230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285302#comment-15285302 ] Bo Meng commented on SPARK-15230: - In the description, `it does not work for describe()` should be `it does not work for distinct()`, please update the description, thanks. > Back quoted column with dot in it fails when running distinct on dataframe > -- > > Key: SPARK-15230 > URL: https://issues.apache.org/jira/browse/SPARK-15230 > Project: Spark > Issue Type: Bug > Components: Examples >Affects Versions: 1.6.0 >Reporter: Barry Becker > > When working with a dataframe columns with .'s in them must be backquoted > (``) or the column name will not be found. This works for most dataframe > methods, but I discovered that it does not work for describe(). > Suppose you have a dataFrame, testDf, with a DoubleType column named > {{pos.NoZero}}. This statememt: > {noformat} > testDf.select(new Column("`pos.NoZero`")).distinct().collect().mkString(", ") > {noformat} > will fail with this error: > {noformat} > org.apache.spark.sql.AnalysisException: Cannot resolve column name > "pos.NoZero" among (pos.NoZero); > at > org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:152) > at > org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:152) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:151) > at > org.apache.spark.sql.DataFrame$$anonfun$dropDuplicates$1$$anonfun$40.apply(DataFrame.scala:1329) > at > org.apache.spark.sql.DataFrame$$anonfun$dropDuplicates$1$$anonfun$40.apply(DataFrame.scala:1329) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.DataFrame$$anonfun$dropDuplicates$1.apply(DataFrame.scala:1329) > at > org.apache.spark.sql.DataFrame$$anonfun$dropDuplicates$1.apply(DataFrame.scala:1328) > at > org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$withPlan(DataFrame.scala:2165) > at org.apache.spark.sql.DataFrame.dropDuplicates(DataFrame.scala:1328) > at org.apache.spark.sql.DataFrame.dropDuplicates(DataFrame.scala:1348) > at org.apache.spark.sql.DataFrame.dropDuplicates(DataFrame.scala:1319) > at org.apache.spark.sql.DataFrame.distinct(DataFrame.scala:1612) > at > com.mineset.spark.vizagg.selection.SelectionExpressionSuite$$anonfun$40.apply$mcV$sp(SelectionExpressionSuite.scala:317) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15062) Show on DataFrame causes OutOfMemoryError, NegativeArraySizeException or segfault
[ https://issues.apache.org/jira/browse/SPARK-15062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267477#comment-15267477 ] Bo Meng commented on SPARK-15062: - I will make a PR shortly. > Show on DataFrame causes OutOfMemoryError, NegativeArraySizeException or > segfault > -- > > Key: SPARK-15062 > URL: https://issues.apache.org/jira/browse/SPARK-15062 > Project: Spark > Issue Type: Bug > Components: SQL > Environment: spark-2.0.0-SNAPSHOT using commit hash > 90787de864b58a1079c23e6581381ca8ffe7685f and Java 1.7.0_67 >Reporter: koert kuipers >Priority: Blocker > > {noformat} > scala> val dfComplicated = sc.parallelize(List((Map("1" -> "a"), List("b", > "c")), (Map("2" -> "b"), List("d", "e".toDF > ... > dfComplicated: org.apache.spark.sql.DataFrame = [_1: map, _2: > array] > scala> dfComplicated.show > java.lang.OutOfMemoryError: Java heap space > at org.apache.spark.unsafe.types.UTF8String.getBytes(UTF8String.java:229) > at org.apache.spark.unsafe.types.UTF8String.toString(UTF8String.java:821) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown > Source) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:241) > at > org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1$$anonfun$apply$13.apply(Dataset.scala:2121) > at > org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1$$anonfun$apply$13.apply(Dataset.scala:2121) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at > org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2121) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:54) > at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2408) > at > org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2120) > at > org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2127) > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1861) > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1860) > at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2438) > at org.apache.spark.sql.Dataset.head(Dataset.scala:1860) > at org.apache.spark.sql.Dataset.take(Dataset.scala:2077) > at org.apache.spark.sql.Dataset.showString(Dataset.scala:238) > at org.apache.spark.sql.Dataset.show(Dataset.scala:529) > at org.apache.spark.sql.Dataset.show(Dataset.scala:489) > at org.apache.spark.sql.Dataset.show(Dataset.scala:498) > ... 6 elided > scala> > {noformat} > By increasing memory to 8G one will instead get a NegativeArraySizeException > or a segfault. > See here for original discussion: > http://apache-spark-developers-list.1001551.n3.nabble.com/spark-2-segfault-td17381.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14897) Upgrade Jetty to latest version of 8/9
[ https://issues.apache.org/jira/browse/SPARK-14897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267256#comment-15267256 ] Bo Meng commented on SPARK-14897: - I will do it once I've got a chance. Thanks. > Upgrade Jetty to latest version of 8/9 > -- > > Key: SPARK-14897 > URL: https://issues.apache.org/jira/browse/SPARK-14897 > Project: Spark > Issue Type: Improvement >Reporter: Adam Kramer > Labels: web-ui > > It looks like the head/master branch of Spark uses quite an old version of > Jetty: 8.1.14.v20131031 > There have been some announcement of security vulnerabilities, notably in > 2015 and there are versions of both 8 and 9 that address those. We recently > left a web-ui port open and had the server compromised within days. Albeit, > this upgrade shouldn't be the only security improvement made, the current > version is clearly vulnerable, as-is. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14955) JDBCRelation should report an IllegalArgumentException if stride equals 0
[ https://issues.apache.org/jira/browse/SPARK-14955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261457#comment-15261457 ] Bo Meng commented on SPARK-14955: - Only after committer gets PR merged, then this JIRA will be automatically closed. Thanks. > JDBCRelation should report an IllegalArgumentException if stride equals 0 > - > > Key: SPARK-14955 > URL: https://issues.apache.org/jira/browse/SPARK-14955 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.1, 1.6.1 >Reporter: Yang Juan hu >Priority: Minor > Original Estimate: 2h > Remaining Estimate: 2h > > In file > https://github.com/apache/spark/blob/40ed2af587cedadc6e5249031857a922b3b234ca/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala > row 56 and 57 has following line > val stride: Long = (partitioning.upperBound / numPartitions > - partitioning.lowerBound / numPartitions) > if we invoke columnPartition as below: > columnPartition( JDBCPartitioningInfo("partitionColumn", 0, 7, 8) ); > columnPartition will generate following where condition: > whereClause: partitionColumn < 0 > whereClause: partitionColumn >= 0 AND partitionColumn < 0 > whereClause: partitionColumn >= 0 AND partitionColumn < 0 > whereClause: partitionColumn >= 0 AND partitionColumn < 0 > whereClause: partitionColumn >= 0 AND partitionColumn < 0 > whereClause: partitionColumn >= 0 AND partitionColumn < 0 > whereClause: partitionColumn >= 0 AND partitionColumn < 0 > whereClause: partitionColumn >= 0 > it will cause data skew, the last partition will contain all data. > Propose to throw an exception if stride equal 0, help spark user to aware > data skew issue ASAP. > if (stride == 0) return throw new > IllegalArgumentException("partitioning.upperBound / numPartitions - > partitioning.lowerBound / numPartitions is zero"); > partitionColumn must be an integral type, if we want to load a big table from > DBMS, we need to do some work around. > Real case to export data from ORACLE database through pyspark. > #data skew issue version > df=ssc.read.format("jdbc").options( url=url, > dbtable="( SELECT ORA_HASH(PART_COL,7) AS PART_ID, A.* FROM DBMS_TAB A ) > TAB_ALIAS", > fetchSize="1000", > partitionColumn="PART_ID", > numPartitions="8", > lowerBound="0", > upperBound="7").load() > #no data skew issue version > df=ssc.read.format("jdbc").options( url=url, > dbtable="( SELECT ORA_HASH(PART_COL,7)+1 AS PART_ID, A.* FROM DBMS_TAB A > ) TAB_ALIAS", > fetchSize="1000", > partitionColumn="PART_ID", > numPartitions="8", > lowerBound="1", > upperBound="8").load() -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14959) Problem Reading partitioned ORC or Parquet files
[ https://issues.apache.org/jira/browse/SPARK-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261126#comment-15261126 ] Bo Meng commented on SPARK-14959: - I have tried on master branch, it works fine with the latest code. > Problem Reading partitioned ORC or Parquet files > - > > Key: SPARK-14959 > URL: https://issues.apache.org/jira/browse/SPARK-14959 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.0.0 > Environment: Hadoop 2.7.1.2.4.0.0-169 (HDP 2.4) >Reporter: Sebastian YEPES FERNANDEZ >Priority: Critical > > Hello, > I have noticed that in the pasts days there is an issue when trying to read > partitioned files from HDFS. > I am running on Spark master branch #c544356 > The write actually works but the read fails. > {code:title=Issue Reproduction} > case class Data(id: Int, text: String) > val ds = spark.createDataset( Seq(Data(0, "hello"), Data(1, "hello"), Data(0, > "world"), Data(1, "there")) ) > scala> > ds.write.mode(org.apache.spark.sql.SaveMode.Overwrite).format("parquet").partitionBy("id").save("/user/spark/test.parquet") > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > java.io.FileNotFoundException: Path is not a file: > /user/spark/test.parquet/id=0 > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:652) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at > org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1242) > at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1227) > at org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:1285) > at > org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:221) > at > org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:217) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:228) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:209) > at > org.apache.spark.sql.execution.datasources.HDFSFileCatalog$$anonfun$9$$anonfun$apply$4.apply(fileSourceInterfaces.scala:372) > at > org.apache.spark.sql.execution.datasources.HDFSFileCatalog$$anonfun$9$$anonfun$apply$4.apply(fileSourceInterfaces.scala:360) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at >
[jira] [Commented] (SPARK-14965) StructType throws exception for missing field
[ https://issues.apache.org/jira/browse/SPARK-14965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261096#comment-15261096 ] Bo Meng commented on SPARK-14965: - I believe returning null does not make sense here, so exception is preferred. > StructType throws exception for missing field > - > > Key: SPARK-14965 > URL: https://issues.apache.org/jira/browse/SPARK-14965 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0, 1.6.1 >Reporter: Gregory Hart >Priority: Minor > > The ScalaDoc for StructType.apply(String) indicates the method should return > null if it does not contain a field with the given name. The method > implementation throws an exception in this case instead. > I suggest that either the implementation should be corrected to return null > if the field is not found, or the ScalaDoc be corrected to indicate an > exception is thrown. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14955) JDBCRelation should report an IllegalArgumentException if stride equals 0
[ https://issues.apache.org/jira/browse/SPARK-14955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260902#comment-15260902 ] Bo Meng commented on SPARK-14955: - Please note that it also affect the master branch. > JDBCRelation should report an IllegalArgumentException if stride equals 0 > - > > Key: SPARK-14955 > URL: https://issues.apache.org/jira/browse/SPARK-14955 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.1, 1.6.1 >Reporter: Yang Juan hu >Priority: Minor > Original Estimate: 2h > Remaining Estimate: 2h > > In file > https://github.com/apache/spark/blob/40ed2af587cedadc6e5249031857a922b3b234ca/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala > row 56 and 57 has following line > val stride: Long = (partitioning.upperBound / numPartitions > - partitioning.lowerBound / numPartitions) > if we invoke columnPartition as below: > columnPartition( JDBCPartitioningInfo("partitionColumn", 0, 7, 8) ); > columnPartition will generate following where condition: > whereClause: partitionColumn < 0 > whereClause: partitionColumn >= 0 AND partitionColumn < 0 > whereClause: partitionColumn >= 0 AND partitionColumn < 0 > whereClause: partitionColumn >= 0 AND partitionColumn < 0 > whereClause: partitionColumn >= 0 AND partitionColumn < 0 > whereClause: partitionColumn >= 0 AND partitionColumn < 0 > whereClause: partitionColumn >= 0 AND partitionColumn < 0 > whereClause: partitionColumn >= 0 > it will cause data skew, the last partition will contain all data. > Propose to throw an exception if stride equal 0, help spark user to aware > data skew issue ASAP. > if (stride == 0) return throw new > IllegalArgumentException("partitioning.upperBound / numPartitions - > partitioning.lowerBound / numPartitions is zero"); > partitionColumn must be an integral type, if we want to load a big table from > DBMS, we need to do some work around. > Real case to export data from ORACLE database through pyspark. > #data skew issue version > df=ssc.read.format("jdbc").options( url=url, > dbtable="( SELECT ORA_HASH(PART_COL,7) AS PART_ID, A.* FROM DBMS_TAB A ) > TAB_ALIAS", > fetchSize="1000", > partitionColumn="PART_ID", > numPartitions="8", > lowerBound="0", > upperBound="7").load() > #no data skew issue version > df=ssc.read.format("jdbc").options( url=url, > dbtable="( SELECT ORA_HASH(PART_COL,7)+1 AS PART_ID, A.* FROM DBMS_TAB A > ) TAB_ALIAS", > fetchSize="1000", > partitionColumn="PART_ID", > numPartitions="8", > lowerBound="1", > upperBound="8").load() -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14897) Upgrade Jetty to latest version of 8/9
[ https://issues.apache.org/jira/browse/SPARK-14897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260740#comment-15260740 ] Bo Meng commented on SPARK-14897: - Although it is easy to make it to the latest 8 version, Jetty 8 is EOL. See: [http://download.eclipse.org/jetty/] Jetty 9 has some significant change, which makes thread pool needs to be passed in the constructor. But Spark current implementation will calculate the pool size based on connectors. So not sure what have to do make it compatible to the current implementation. Any suggestions? > Upgrade Jetty to latest version of 8/9 > -- > > Key: SPARK-14897 > URL: https://issues.apache.org/jira/browse/SPARK-14897 > Project: Spark > Issue Type: Improvement >Reporter: Adam Kramer > Labels: web-ui > > It looks like the head/master branch of Spark uses quite an old version of > Jetty: 8.1.14.v20131031 > There have been some announcement of security vulnerabilities, notably in > 2015 and there are versions of both 8 and 9 that address those. We recently > left a web-ui port open and had the server compromised within days. Albeit, > this upgrade shouldn't be the only security improvement made, the current > version is clearly vulnerable, as-is. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14955) JDBCRelation should report an IllegalArgumentException if stride equals 0
[ https://issues.apache.org/jira/browse/SPARK-14955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260352#comment-15260352 ] Bo Meng commented on SPARK-14955: - I will take a look to see what can be improved. > JDBCRelation should report an IllegalArgumentException if stride equals 0 > - > > Key: SPARK-14955 > URL: https://issues.apache.org/jira/browse/SPARK-14955 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.1, 1.6.1 >Reporter: Yang Juan hu >Priority: Minor > Original Estimate: 2h > Remaining Estimate: 2h > > In file > https://github.com/apache/spark/blob/40ed2af587cedadc6e5249031857a922b3b234ca/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala > row 56 and 57 has following line > val stride: Long = (partitioning.upperBound / numPartitions > - partitioning.lowerBound / numPartitions) > if we invoke columnPartition as below: > columnPartition( JDBCPartitioningInfo("partitionColumn", 0, 7, 8) ); > columnPartition will generate following where condition: > whereClause: partitionColumn < 0 > whereClause: partitionColumn >= 0 AND partitionColumn < 0 > whereClause: partitionColumn >= 0 AND partitionColumn < 0 > whereClause: partitionColumn >= 0 AND partitionColumn < 0 > whereClause: partitionColumn >= 0 AND partitionColumn < 0 > whereClause: partitionColumn >= 0 AND partitionColumn < 0 > whereClause: partitionColumn >= 0 AND partitionColumn < 0 > whereClause: partitionColumn >= 0 > it will cause data skew, the last partition will contain all data. > Propose to throw an exception if stride equal 0, help spark user to aware > data skew issue ASAP. > if (stride == 0) return throw new > IllegalArgumentException("partitioning.upperBound / numPartitions - > partitioning.lowerBound / numPartitions is zero"); > partitionColumn must be an integral type, if we want to load a big table from > DBMS, we need to do some work around. > Real case to export data from ORACLE database through pyspark. > #data skew issue version > df=ssc.read.format("jdbc").options( url=url, > dbtable="( SELECT ORA_HASH(PART_COL,7) AS PART_ID, A.* FROM DBMS_TAB A ) > TAB_ALIAS", > fetchSize="1000", > partitionColumn="PART_ID", > numPartitions="8", > lowerBound="0", > upperBound="7").load() > #no data skew issue version > df=ssc.read.format("jdbc").options( url=url, > dbtable="( SELECT ORA_HASH(PART_COL,7)+1 AS PART_ID, A.* FROM DBMS_TAB A > ) TAB_ALIAS", > fetchSize="1000", > partitionColumn="PART_ID", > numPartitions="8", > lowerBound="1", > upperBound="8").load() -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-14928) Support variable substitution in SET command
[ https://issues.apache.org/jira/browse/SPARK-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng closed SPARK-14928. --- Resolution: Won't Fix > Support variable substitution in SET command > > > Key: SPARK-14928 > URL: https://issues.apache.org/jira/browse/SPARK-14928 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Bo Meng > > In the {{SET key=value}} command, value can be defined as a variable and > replaced by substitution. > Since we have {{VARIABLE_SUBSTITUTE_ENABLED}} and > {{VARIABLE_SUBSTITUTE_DEPTH}} defined in the {{SQLConf}}, it is nice to use > them in the SET command. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14928) Support variable substitution in SET command
[ https://issues.apache.org/jira/browse/SPARK-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259069#comment-15259069 ] Bo Meng commented on SPARK-14928: - Parser already handles it. Closing it. > Support variable substitution in SET command > > > Key: SPARK-14928 > URL: https://issues.apache.org/jira/browse/SPARK-14928 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Bo Meng > > In the {{SET key=value}} command, value can be defined as a variable and > replaced by substitution. > Since we have {{VARIABLE_SUBSTITUTE_ENABLED}} and > {{VARIABLE_SUBSTITUTE_DEPTH}} defined in the {{SQLConf}}, it is nice to use > them in the SET command. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14928) Support variable substitution in SET command
Bo Meng created SPARK-14928: --- Summary: Support variable substitution in SET command Key: SPARK-14928 URL: https://issues.apache.org/jira/browse/SPARK-14928 Project: Spark Issue Type: Improvement Components: SQL Reporter: Bo Meng In the {{SET key=value}} command, value can be defined as a variable and replaced by substitution. Since we have {{VARIABLE_SUBSTITUTE_ENABLED}} and {{VARIABLE_SUBSTITUTE_DEPTH}} defined in the {{SQLConf}}, it is nice to use them in the SET command. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14923) Support "Extended" in "Describe" table DDL
[ https://issues.apache.org/jira/browse/SPARK-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14923: Description: Currently, {{Extended}} keywords in {{Describe [Extended] }} DDL is simply ignored. This JIRA is to bring it back with the similar behavior as Hive does. (was: Currently, {{Extended}} keywords in {{Describe [Extended]Support "Extended" in "Describe" table DDL > -- > > Key: SPARK-14923 > URL: https://issues.apache.org/jira/browse/SPARK-14923 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Bo Meng > > Currently, {{Extended}} keywords in {{Describe [Extended] }} DDL > is simply ignored. This JIRA is to bring it back with the similar behavior as > Hive does. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14923) Support "Extended" in "Describe" table DDL
Bo Meng created SPARK-14923: --- Summary: Support "Extended" in "Describe" table DDL Key: SPARK-14923 URL: https://issues.apache.org/jira/browse/SPARK-14923 Project: Spark Issue Type: Improvement Components: SQL Reporter: Bo Meng Currently, {{Extended}} keywords in {{Describe [Extended]
[jira] [Commented] (SPARK-14840) Cannot drop a table which has the name starting with 'or'
[ https://issues.apache.org/jira/browse/SPARK-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253400#comment-15253400 ] Bo Meng commented on SPARK-14840: - I am testing against master: 1. I do not think your test is valid, at least it should be: sqlContext.sql("drop table tmp.order") 2. It works fine if you just add {{`}} to {{order}}, without it, it will throw exception. sqlContext.sql("drop table `order`"); I have ignored {{tmp}} here. > Cannot drop a table which has the name starting with 'or' > - > > Key: SPARK-14840 > URL: https://issues.apache.org/jira/browse/SPARK-14840 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2 >Reporter: Kwangwoo Kim > > sqlContext("drop table tmp.order") > The above code makes error as following: > 6/04/22 14:27:17 INFO ParseDriver: Parsing command: drop table tmp.order > 16/04/22 14:27:19 INFO ParseDriver: Parse Completed > 16/04/22 14:27:19 WARN DropTable: [1.5] failure: identifier expected > tmp.order > ^ > java.lang.RuntimeException: [1.5] failure: identifier expected > tmp.order > ^ > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.catalyst.SqlParser$.parseTableIdentifier(SqlParser.scala:58) > at org.apache.spark.sql.SQLContext.table(SQLContext.scala:827) > at org.apache.spark.sql.hive.execution.DropTable.run(commands.scala:62) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:145) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:130) > at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817) > at > $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:26) > at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:31) > at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33) > at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC.(:35) > at $line15.$read$$iwC$$iwC$$iwC$$iwC.(:37) > at $line15.$read$$iwC$$iwC$$iwC.(:39) > at $line15.$read$$iwC$$iwC.(:41) > at $line15.$read$$iwC.(:43) > at $line15.$read.(:45) > at $line15.$read$.(:49) > at $line15.$read$.() > at $line15.$eval$.(:7) > at $line15.$eval$.() > at $line15.$eval.$print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) > at > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) > at > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) > at > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) > at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) > at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at >
[jira] [Commented] (SPARK-14840) Cannot drop a table which has the name starting with 'or'
[ https://issues.apache.org/jira/browse/SPARK-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253376#comment-15253376 ] Bo Meng commented on SPARK-14840: - I think because {{order}} is a keyword, please try not to use it. > Cannot drop a table which has the name starting with 'or' > - > > Key: SPARK-14840 > URL: https://issues.apache.org/jira/browse/SPARK-14840 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2 >Reporter: Kwangwoo Kim > > sqlContext("drop table tmp.order") > The above code makes error as following: > 6/04/22 14:27:17 INFO ParseDriver: Parsing command: drop table tmp.order > 16/04/22 14:27:19 INFO ParseDriver: Parse Completed > 16/04/22 14:27:19 WARN DropTable: [1.5] failure: identifier expected > tmp.order > ^ > java.lang.RuntimeException: [1.5] failure: identifier expected > tmp.order > ^ > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.catalyst.SqlParser$.parseTableIdentifier(SqlParser.scala:58) > at org.apache.spark.sql.SQLContext.table(SQLContext.scala:827) > at org.apache.spark.sql.hive.execution.DropTable.run(commands.scala:62) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:145) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:130) > at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817) > at > $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:26) > at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:31) > at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33) > at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC.(:35) > at $line15.$read$$iwC$$iwC$$iwC$$iwC.(:37) > at $line15.$read$$iwC$$iwC$$iwC.(:39) > at $line15.$read$$iwC$$iwC.(:41) > at $line15.$read$$iwC.(:43) > at $line15.$read.(:45) > at $line15.$read$.(:49) > at $line15.$read$.() > at $line15.$eval$.(:7) > at $line15.$eval$.() > at $line15.$eval.$print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) > at > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) > at > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) > at > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) > at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) > at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) > at
[jira] [Comment Edited] (SPARK-14840) Cannot drop a table which has the name starting with 'or'
[ https://issues.apache.org/jira/browse/SPARK-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253376#comment-15253376 ] Bo Meng edited comment on SPARK-14840 at 4/22/16 5:34 AM: -- I think because {{order}} is a keyword, please try not to use it as table name. was (Author: bomeng): I think because {{order}} is a keyword, please try not to use it. > Cannot drop a table which has the name starting with 'or' > - > > Key: SPARK-14840 > URL: https://issues.apache.org/jira/browse/SPARK-14840 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2 >Reporter: Kwangwoo Kim > > sqlContext("drop table tmp.order") > The above code makes error as following: > 6/04/22 14:27:17 INFO ParseDriver: Parsing command: drop table tmp.order > 16/04/22 14:27:19 INFO ParseDriver: Parse Completed > 16/04/22 14:27:19 WARN DropTable: [1.5] failure: identifier expected > tmp.order > ^ > java.lang.RuntimeException: [1.5] failure: identifier expected > tmp.order > ^ > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.catalyst.SqlParser$.parseTableIdentifier(SqlParser.scala:58) > at org.apache.spark.sql.SQLContext.table(SQLContext.scala:827) > at org.apache.spark.sql.hive.execution.DropTable.run(commands.scala:62) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:145) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:130) > at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817) > at > $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:26) > at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:31) > at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33) > at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC.(:35) > at $line15.$read$$iwC$$iwC$$iwC$$iwC.(:37) > at $line15.$read$$iwC$$iwC$$iwC.(:39) > at $line15.$read$$iwC$$iwC.(:41) > at $line15.$read$$iwC.(:43) > at $line15.$read.(:45) > at $line15.$read$.(:49) > at $line15.$read$.() > at $line15.$eval$.(:7) > at $line15.$eval$.() > at $line15.$eval.$print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) > at > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) > at > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) > at > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) > at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) > at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at >
[jira] [Issue Comment Deleted] (SPARK-14541) SQL function: IFNULL, NULLIF, NVL and NVL2
[ https://issues.apache.org/jira/browse/SPARK-14541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14541: Comment: was deleted (was: I will try to do it one by one. ) > SQL function: IFNULL, NULLIF, NVL and NVL2 > -- > > Key: SPARK-14541 > URL: https://issues.apache.org/jira/browse/SPARK-14541 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Davies Liu > > It will be great to have these SQL functions: > IFNULL, NULLIF, NVL, NVL2 > The meaning of these functions could be found in oracle docs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14819) Improve the "SET" and "SET -v" command
[ https://issues.apache.org/jira/browse/SPARK-14819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14819: Description: Currently {{SET}} and {{SET -v}} commands are similar to Hive {{SET}} command except the following difference: 1. The result is not sorted; 2. When using {{SET}} and {{SET -v}}, in addition to the Hive related properties, it will also list all the system properties and environment properties, which is very useful in some cases. This JIRA is trying to make the current {{SET}} command more consistent to Hive output. was: Currently `SET` and `SET -v` commands are similar to Hive `SET` command except the following difference: 1. The result is not sorted; 2. When using `SET` and `SET -v`, in addition to the Hive related properties, it will also list all the system properties and environment properties, which is very useful in some cases. This JIRA is trying to make the current `SET` command more consistent to Hive output. > Improve the "SET" and "SET -v" command > -- > > Key: SPARK-14819 > URL: https://issues.apache.org/jira/browse/SPARK-14819 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Bo Meng > > Currently {{SET}} and {{SET -v}} commands are similar to Hive {{SET}} command > except the following difference: > 1. The result is not sorted; > 2. When using {{SET}} and {{SET -v}}, in addition to the Hive related > properties, it will also list all the system properties and environment > properties, which is very useful in some cases. > This JIRA is trying to make the current {{SET}} command more consistent to > Hive output. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14819) Improve the "SET" and "SET -v" command
[ https://issues.apache.org/jira/browse/SPARK-14819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14819: Description: Currently `SET` and `SET -v` commands are similar to Hive `SET` command except the following difference: 1. The result is not sorted; 2. When using `SET` and `SET -v`, in addition to the Hive related properties, it will also list all the system properties and environment properties, which is very useful in some cases. This JIRA is trying to make the current `SET` command more consistent to Hive output. was: Currently `SET` and `SET -v` commands are similar to Hive `SET` command except the following difference: 1. The result is not sorted; 2. When using `SET` and `SET -v`, in addition to the Hive related properties, it will also list all the system properties and environment properties, which is very useful in some case. This JIRA is trying to make the current `SET` command more consistent to Hive output. > Improve the "SET" and "SET -v" command > -- > > Key: SPARK-14819 > URL: https://issues.apache.org/jira/browse/SPARK-14819 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Bo Meng > > Currently `SET` and `SET -v` commands are similar to Hive `SET` command > except the following difference: > 1. The result is not sorted; > 2. When using `SET` and `SET -v`, in addition to the Hive related properties, > it will also list all the system properties and environment properties, which > is very useful in some cases. > This JIRA is trying to make the current `SET` command more consistent to Hive > output. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14819) Improve the "SET" and "SET -v" command
Bo Meng created SPARK-14819: --- Summary: Improve the "SET" and "SET -v" command Key: SPARK-14819 URL: https://issues.apache.org/jira/browse/SPARK-14819 Project: Spark Issue Type: Improvement Components: SQL Reporter: Bo Meng Currently `SET` and `SET -v` commands are similar to Hive `SET` command except the following difference: 1. The result is not sorted; 2. When using `SET` and `SET -v`, in addition to the Hive related properties, it will also list all the system properties and environment properties, which is very useful in some case. This JIRA is trying to make the current `SET` command more consistent to Hive output. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14414) Make error messages consistent across DDLs
[ https://issues.apache.org/jira/browse/SPARK-14414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248265#comment-15248265 ] Bo Meng commented on SPARK-14414: - Can anyone update the 'Assignee' for this one, since my code was already merged in? If there are still something left I can work on, please advice, thanks! > Make error messages consistent across DDLs > -- > > Key: SPARK-14414 > URL: https://issues.apache.org/jira/browse/SPARK-14414 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > > There are many different error messages right now when the user tries to run > something that's not supported. We might throw AnalysisException or > ParseException or NoSuchFunctionException etc. We should make all of these > consistent before 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14460) DataFrameWriter JDBC doesn't Quote/Escape column names
[ https://issues.apache.org/jira/browse/SPARK-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242330#comment-15242330 ] Bo Meng commented on SPARK-14460: - I have added the test case that is using "order" as column name. Please check it out. Thanks. > DataFrameWriter JDBC doesn't Quote/Escape column names > -- > > Key: SPARK-14460 > URL: https://issues.apache.org/jira/browse/SPARK-14460 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1 >Reporter: Sean Rose > Labels: easyfix > > When I try to write a DataFrame which contains a column with a space in it > ("Patient Address"), I get an error: java.sql.BatchUpdateException: Incorrect > syntax near 'Address' > I believe the issue is that JdbcUtils.insertStatement isn't quoting/escaping > column names. JdbcDialect has the "quoteIdentifier" method, which could be > called. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14614) Add `bround` function
[ https://issues.apache.org/jira/browse/SPARK-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240259#comment-15240259 ] Bo Meng commented on SPARK-14614: - I have tried on Hive 1.2.1, actually this function seems dropped out. > Add `bround` function > - > > Key: SPARK-14614 > URL: https://issues.apache.org/jira/browse/SPARK-14614 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Dongjoon Hyun > > This issue aims to add `bound` function (aka Banker's round) by extending > current `round` implementation. > Hive supports `bround` since 1.3.0. [Language > Manual|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF]. > {code} > hive> select round(2.5), bround(2.5); > OK > 3.0 2.0 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14541) SQL function: IFNULL, NULLIF, NVL and NVL2
[ https://issues.apache.org/jira/browse/SPARK-14541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240157#comment-15240157 ] Bo Meng commented on SPARK-14541: - I will try to do it one by one. > SQL function: IFNULL, NULLIF, NVL and NVL2 > -- > > Key: SPARK-14541 > URL: https://issues.apache.org/jira/browse/SPARK-14541 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Davies Liu > > It will be great to have these SQL functions: > IFNULL, NULLIF, NVL, NVL2 > The meaning of these functions could be found in oracle docs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14441) Consolidate DDL tests
[ https://issues.apache.org/jira/browse/SPARK-14441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236359#comment-15236359 ] Bo Meng commented on SPARK-14441: - I think DDLSuite and DDLCommandSuite can be combined into one, also can HiveDDLSuite and HiveDDLCommandSuite, since they are just testing the different stage. If you agree, I will make the changes. > Consolidate DDL tests > - > > Key: SPARK-14441 > URL: https://issues.apache.org/jira/browse/SPARK-14441 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 2.0.0 >Reporter: Andrew Or > > Today we have DDLSuite, DDLCommandSuite, HiveDDLCommandSuite. It's confusing > whether a test should exist in one or the other. It also makes it less clear > whether our test coverage is comprehensive. Ideally we should consolidate > these files as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14532) Spark SQL IF/ELSE does not handle Double correctly
[ https://issues.apache.org/jira/browse/SPARK-14532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15235795#comment-15235795 ] Bo Meng commented on SPARK-14532: - Just verified, it works fine with master. > Spark SQL IF/ELSE does not handle Double correctly > -- > > Key: SPARK-14532 > URL: https://issues.apache.org/jira/browse/SPARK-14532 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Al M > > I am using Spark SQL to add new columns to my data. Below is an example > snipped in Scala: > {code}myDF.withColumn("newcol", new > Column(SqlParser.parseExpression(sparkSqlExpr))).show{code} > *What Works* > If sparkSqlExpr = "IF(1=1, 1, 0)" then i see 1 in the result as expected. > If sparkSqlExpr = "IF(1=1, 1.0, 1.5)" then i see 1.0 in the result as > expected. > If sparkSqlExpr = "IF(1=1, 'A', 'B')" then i see 'A' in the result as > expected. > *What does not Work* > If sparkSqlExpr = "IF(1=1, 1.0, 0.0)" then I see error > org.apache.spark.sql.AnalysisException: cannot resolve 'if ((1 = 1)) 1.0 else > 0.0' due to data type mismatch: differing types in 'if ((1 = 1)) 1.0 else > 0.0' (decimal(2,1) and decimal(1,1)).; > If sparkSqlExpr = "IF(1=1, 1.0, 10.0)" then I see error If sparkSqlExpr = > "IF(1=1, 1.0, 0.0)" then I see error > org.apache.spark.sql.AnalysisException: cannot resolve 'if ((1 = 1)) 1.0 else > 10.0' due to data type mismatch: differing types in 'if ((1 = 1)) 1.0 else > 10.0' (decimal(2,1) and decimal(3,1)).; > If sparkSqlExpr = "IF(1=1, 1.1, 1.11)" then I see error > org.apache.spark.sql.AnalysisException: cannot resolve 'if ((1 = 1)) 1.1 else > 1.11' due to data type mismatch: differing types in 'if ((1 = 1)) 1.1 else > 1.11' (decimal(2,1) and decimal(3,2)).; > It looks like the Spark SQL typing system is seeing doubles as different > types depending on the number of digits before and after the decimal point -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14127) [Table related commands] Describe table
[ https://issues.apache.org/jira/browse/SPARK-14127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233748#comment-15233748 ] Bo Meng edited comment on SPARK-14127 at 4/9/16 9:50 PM: - A little change to your creation of table will show comment in the "DESCRIBE": {quote}create table ptestfilter (a string, b int) partitioned by (c string comment 'abc', d string);{quote} was (Author: bomeng): A little change of your creation of table will show comment in the "DESCRIBE": {quote}create table ptestfilter (a string, b int) partitioned by (c string comment 'abc', d string);{quote} > [Table related commands] Describe table > --- > > Key: SPARK-14127 > URL: https://issues.apache.org/jira/browse/SPARK-14127 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai > > TOK_DESCTABLE > Describe a column/table/partition (see here and here). Seems we support > DESCRIBE and DESCRIBE EXTENDED. It will be good to also support other > syntaxes (and check if we are missing anything). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14127) [Table related commands] Describe table
[ https://issues.apache.org/jira/browse/SPARK-14127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233748#comment-15233748 ] Bo Meng edited comment on SPARK-14127 at 4/9/16 9:49 PM: - A little change of your creation of table will show comment in the "DESCRIBE": {quote}create table ptestfilter (a string, b int) partitioned by (c string comment 'abc', d string);{quote} was (Author: bomeng): A little change of your creation of table will show comment in the "DESCRIBE": create table ptestfilter (a string, b int) partitioned by (c string comment 'abc', d string); > [Table related commands] Describe table > --- > > Key: SPARK-14127 > URL: https://issues.apache.org/jira/browse/SPARK-14127 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai > > TOK_DESCTABLE > Describe a column/table/partition (see here and here). Seems we support > DESCRIBE and DESCRIBE EXTENDED. It will be good to also support other > syntaxes (and check if we are missing anything). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14127) [Table related commands] Describe table
[ https://issues.apache.org/jira/browse/SPARK-14127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233748#comment-15233748 ] Bo Meng edited comment on SPARK-14127 at 4/9/16 9:49 PM: - A little change of your creation of table will show comment in the "DESCRIBE": create table ptestfilter (a string, b int) partitioned by (c string comment 'abc', d string); was (Author: bomeng): A little change of your creation of table will show comment in the "DESCRIBE": `create table ptestfilter (a string, b int) partitioned by (c string comment 'abc', d string);` > [Table related commands] Describe table > --- > > Key: SPARK-14127 > URL: https://issues.apache.org/jira/browse/SPARK-14127 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai > > TOK_DESCTABLE > Describe a column/table/partition (see here and here). Seems we support > DESCRIBE and DESCRIBE EXTENDED. It will be good to also support other > syntaxes (and check if we are missing anything). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14127) [Table related commands] Describe table
[ https://issues.apache.org/jira/browse/SPARK-14127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233748#comment-15233748 ] Bo Meng commented on SPARK-14127: - A little change of your creation of table will show comment in the "DESCRIBE": `create table ptestfilter (a string, b int) partitioned by (c string comment 'abc', d string);` > [Table related commands] Describe table > --- > > Key: SPARK-14127 > URL: https://issues.apache.org/jira/browse/SPARK-14127 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai > > TOK_DESCTABLE > Describe a column/table/partition (see here and here). Seems we support > DESCRIBE and DESCRIBE EXTENDED. It will be good to also support other > syntaxes (and check if we are missing anything). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14496) some typos in the java doc while browsing the codes
Bo Meng created SPARK-14496: --- Summary: some typos in the java doc while browsing the codes Key: SPARK-14496 URL: https://issues.apache.org/jira/browse/SPARK-14496 Project: Spark Issue Type: Bug Components: SQL Reporter: Bo Meng Priority: Trivial Really minor issues. I just found them while looking into the catalog codes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14460) DataFrameWriter JDBC doesn't Quote/Escape column names
[ https://issues.apache.org/jira/browse/SPARK-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231077#comment-15231077 ] Bo Meng edited comment on SPARK-14460 at 4/8/16 3:04 AM: - Thanks [~srose03] for finding the root cause - That makes the fix easier.I will post the fix shortly. was (Author: bomeng): I can take a look. Thanks. > DataFrameWriter JDBC doesn't Quote/Escape column names > -- > > Key: SPARK-14460 > URL: https://issues.apache.org/jira/browse/SPARK-14460 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1 >Reporter: Sean Rose > Labels: easyfix > > When I try to write a DataFrame which contains a column with a space in it > ("Patient Address"), I get an error: java.sql.BatchUpdateException: Incorrect > syntax near 'Address' > I believe the issue is that JdbcUtils.insertStatement isn't quoting/escaping > column names. JdbcDialect has the "quoteIdentifier" method, which could be > called. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14460) DataFrameWriter JDBC doesn't Quote/Escape column names
[ https://issues.apache.org/jira/browse/SPARK-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231077#comment-15231077 ] Bo Meng commented on SPARK-14460: - I can take a look. Thanks. > DataFrameWriter JDBC doesn't Quote/Escape column names > -- > > Key: SPARK-14460 > URL: https://issues.apache.org/jira/browse/SPARK-14460 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1 >Reporter: Sean Rose > Labels: easyfix > > When I try to write a DataFrame which contains a column with a space in it > ("Patient Address"), I get an error: java.sql.BatchUpdateException: Incorrect > syntax near 'Address' > I believe the issue is that JdbcUtils.insertStatement isn't quoting/escaping > column names. JdbcDialect has the "quoteIdentifier" method, which could be > called. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14429) Improve LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
[ https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14429: Summary: Improve LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL (was: Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL) > Improve LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL > > > Key: SPARK-14429 > URL: https://issues.apache.org/jira/browse/SPARK-14429 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng >Priority: Minor > > LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} > etc DDL. In the pattern, user can use `|` or `\*` as wildcards. > I'd like to address a few issues in this JIRA: > # Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the > replacement was scattered in several places; it is good to have one place to > do the same thing; > # Consistency with Hive: the pattern is case insensitive in Hive and white > spaces will be trimmed, but current pattern matching does not do that. For > example, suppose we have tables t1, t2, t3, {code:SQL}SHOW TABLES LIKE ' T* > '; {code} will list all the t-tables. > # Sort the result. > Please use Hive to verify it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
[ https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14429: Description: LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA: # Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the replacement was scattered in several places; it is good to have one place to do the same thing; # Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables t1, t2, t3, {code:SQL}SHOW TABLES LIKE ' T* '; {code} will list all the t-tables. # Sort the result. Please use Hive to verify it. was: LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA: # Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the replacement was scattered in several places; it is good to have one place to do the same thing; # Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE ' T* '; {code} will list all the t-tables. # Sort the result. Please use Hive to verify it. > Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL > > > Key: SPARK-14429 > URL: https://issues.apache.org/jira/browse/SPARK-14429 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng >Priority: Minor > > LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} > etc DDL. In the pattern, user can use `|` or `\*` as wildcards. > I'd like to address a few issues in this JIRA: > # Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the > replacement was scattered in several places; it is good to have one place to > do the same thing; > # Consistency with Hive: the pattern is case insensitive in Hive and white > spaces will be trimmed, but current pattern matching does not do that. For > example, suppose we have tables t1, t2, t3, {code:SQL}SHOW TABLES LIKE ' T* > '; {code} will list all the t-tables. > # Sort the result. > Please use Hive to verify it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
[ https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14429: Description: LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA: # Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the replacement was scattered in several places; it is good to have one place to do the same thing; # Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE ' T* '; {code} will list all the t-tables. # Sort the result. Please use Hive to verify it. was: LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA: # Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the replacement was scattered in several places; it is good to have one place to do the same thing; # Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE ' T* '; {code} will list all the t-tables. Please use Hive to verify it. > Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL > > > Key: SPARK-14429 > URL: https://issues.apache.org/jira/browse/SPARK-14429 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng >Priority: Minor > > LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} > etc DDL. In the pattern, user can use `|` or `\*` as wildcards. > I'd like to address a few issues in this JIRA: > # Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the > replacement was scattered in several places; it is good to have one place to > do the same thing; > # Consistency with Hive: the pattern is case insensitive in Hive and white > spaces will be trimmed, but current pattern matching does not do that. For > example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE ' > T* '; {code} will list all the t-tables. > # Sort the result. > Please use Hive to verify it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
[ https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14429: Description: LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA. # Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the replacement was scattered in several places; it is good to have one place to do the same thing; # Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE ' T* '; {code} will list all the t-tables. Please use Hive to verify it. was: LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA. 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the replacement was scattered in several places; it is good to have one place to do the same thing; 2. Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE ' T* '; {code} will list all the t-tables. Please use Hive to verify it. > Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL > > > Key: SPARK-14429 > URL: https://issues.apache.org/jira/browse/SPARK-14429 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng >Priority: Minor > > LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} > etc DDL. In the pattern, user can use `|` or `\*` as wildcards. > I'd like to address a few issues in this JIRA. > # Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the > replacement was scattered in several places; it is good to have one place to > do the same thing; > # Consistency with Hive: the pattern is case insensitive in Hive and white > spaces will be trimmed, but current pattern matching does not do that. For > example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE ' > T* '; {code} will list all the t-tables. Please use Hive to verify it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
[ https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14429: Description: LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA: # Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the replacement was scattered in several places; it is good to have one place to do the same thing; # Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE ' T* '; {code} will list all the t-tables. Please use Hive to verify it. was: LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA. # Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the replacement was scattered in several places; it is good to have one place to do the same thing; # Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE ' T* '; {code} will list all the t-tables. Please use Hive to verify it. > Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL > > > Key: SPARK-14429 > URL: https://issues.apache.org/jira/browse/SPARK-14429 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng >Priority: Minor > > LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} > etc DDL. In the pattern, user can use `|` or `\*` as wildcards. > I'd like to address a few issues in this JIRA: > # Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the > replacement was scattered in several places; it is good to have one place to > do the same thing; > # Consistency with Hive: the pattern is case insensitive in Hive and white > spaces will be trimmed, but current pattern matching does not do that. For > example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE ' > T* '; {code} will list all the t-tables. Please use Hive to verify it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
[ https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14429: Description: LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA. 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the replacement was scattered in several places; it is good to have one place to do the same thing; 2. Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE ' T* ' {code} will list all the t-tables. Please use Hive to verify it. was: LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA. 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the replacement was scattered in several places; it is good to have one place to do the same thing; 2. Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), {panel}SHOW TABLES LIKE ' T* ' {panel} will list all the t-tables. Please use Hive to verify it. > Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL > > > Key: SPARK-14429 > URL: https://issues.apache.org/jira/browse/SPARK-14429 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng >Priority: Minor > > LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} > etc DDL. In the pattern, user can use `|` or `\*` as wildcards. > I'd like to address a few issues in this JIRA. > 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the > replacement was scattered in several places; it is good to have one place to > do the same thing; > 2. Consistency with Hive: the pattern is case insensitive in Hive and white > spaces will be trimmed, but current pattern matching does not do that. For > example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE ' > T* ' {code} will list all the t-tables. Please use Hive to verify it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
[ https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14429: Description: LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA. 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the replacement was scattered in several places; it is good to have one place to do the same thing; 2. Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), {panel}SHOW TABLES LIKE ' T* ' {panel} will list all the t-tables. Please use Hive to verify it. was: LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA. 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the replacement was scattered in several places; it is good to have one place to do the same thing; 2. Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), {panel}`SHOW TABLES LIKE ' T* ' `{panel} will list all the t-tables. Please use Hive to verify it. > Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL > > > Key: SPARK-14429 > URL: https://issues.apache.org/jira/browse/SPARK-14429 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng >Priority: Minor > > LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} > etc DDL. In the pattern, user can use `|` or `\*` as wildcards. > I'd like to address a few issues in this JIRA. > 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the > replacement was scattered in several places; it is good to have one place to > do the same thing; > 2. Consistency with Hive: the pattern is case insensitive in Hive and white > spaces will be trimmed, but current pattern matching does not do that. For > example, suppose we have tables (t1, t2, t3), {panel}SHOW TABLES LIKE ' T* > ' {panel} will list all the t-tables. Please use Hive to verify it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
[ https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14429: Description: LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA. 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the replacement was scattered in several places; it is good to have one place to do the same thing; 2. Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE ' T* '; {code} will list all the t-tables. Please use Hive to verify it. was: LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA. 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the replacement was scattered in several places; it is good to have one place to do the same thing; 2. Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE ' T* ' {code} will list all the t-tables. Please use Hive to verify it. > Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL > > > Key: SPARK-14429 > URL: https://issues.apache.org/jira/browse/SPARK-14429 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng >Priority: Minor > > LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} > etc DDL. In the pattern, user can use `|` or `\*` as wildcards. > I'd like to address a few issues in this JIRA. > 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the > replacement was scattered in several places; it is good to have one place to > do the same thing; > 2. Consistency with Hive: the pattern is case insensitive in Hive and white > spaces will be trimmed, but current pattern matching does not do that. For > example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE ' > T* '; {code} will list all the t-tables. Please use Hive to verify it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
[ https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14429: Description: LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA. 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the replacement was scattered in several places; it is good to have one place to do the same thing; 2. Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), {panel}`SHOW TABLES LIKE ' T* ' `{panel} will list all the t-tables. Please use Hive to verify it. was: LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA. 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the replacement was scattered in several places; it is good to have one place to do the same thing; 2. Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE ' T* ' ` will list all the t-tables. Please use Hive to verify it. > Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL > > > Key: SPARK-14429 > URL: https://issues.apache.org/jira/browse/SPARK-14429 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng >Priority: Minor > > LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} > etc DDL. In the pattern, user can use `|` or `\*` as wildcards. > I'd like to address a few issues in this JIRA. > 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the > replacement was scattered in several places; it is good to have one place to > do the same thing; > 2. Consistency with Hive: the pattern is case insensitive in Hive and white > spaces will be trimmed, but current pattern matching does not do that. For > example, suppose we have tables (t1, t2, t3), {panel}`SHOW TABLES LIKE ' T* > ' `{panel} will list all the t-tables. Please use Hive to verify it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
[ https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14429: Description: LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA. 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the replacement was scattered in several places; it is good to have one place to do the same thing; 2. Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE ' T* ' ` will list all the t-tables. Please use Hive to verify it. was: LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA. 1. Currently, we used `replaceAll()` to replace {block}`\*` with `.\*`{block}, but the replacement was scattered in several places; it is good to have one place to do the same thing; 2. Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE ' T* ' ` will list all the t-tables. Please use Hive to verify it. > Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL > > > Key: SPARK-14429 > URL: https://issues.apache.org/jira/browse/SPARK-14429 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng >Priority: Minor > > LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} > etc DDL. In the pattern, user can use `|` or `\*` as wildcards. > I'd like to address a few issues in this JIRA. > 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the > replacement was scattered in several places; it is good to have one place to > do the same thing; > 2. Consistency with Hive: the pattern is case insensitive in Hive and white > spaces will be trimmed, but current pattern matching does not do that. For > example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE ' T* ' ` > will list all the t-tables. Please use Hive to verify it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
[ https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14429: Description: LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA. 1. Currently, we used `replaceAll()` to replace {block}`\*` with `.\*`{block}, but the replacement was scattered in several places; it is good to have one place to do the same thing; 2. Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE ' T* ' ` will list all the t-tables. Please use Hive to verify it. was: LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA. 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the replacement was scattered in several places; it is good to have one place to do the same thing; 2. Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE ' T* ' ` will list all the t-tables. Please use Hive to verify it. > Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL > > > Key: SPARK-14429 > URL: https://issues.apache.org/jira/browse/SPARK-14429 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng >Priority: Minor > > LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} > etc DDL. In the pattern, user can use `|` or `\*` as wildcards. > I'd like to address a few issues in this JIRA. > 1. Currently, we used `replaceAll()` to replace {block}`\*` with > `.\*`{block}, but the replacement was scattered in several places; it is good > to have one place to do the same thing; > 2. Consistency with Hive: the pattern is case insensitive in Hive and white > spaces will be trimmed, but current pattern matching does not do that. For > example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE ' T* ' ` > will list all the t-tables. Please use Hive to verify it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
[ https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14429: Description: LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA. 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the replacement was scattered in several places; it is good to have one place to do the same thing; 2. Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE ' T* ' ` will list all the t-tables. Please use Hive to verify it. was: LIKE is commonly used in SHOW TABLES / FUNCTIONS etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA. 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the replacement was scattered in several places; it is good to have one place to do the same thing; 2. Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE ' T* ' ` will list all the t-tables. Please use Hive to verify it. > Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL > > > Key: SPARK-14429 > URL: https://issues.apache.org/jira/browse/SPARK-14429 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng >Priority: Minor > > LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} > etc DDL. In the pattern, user can use `|` or `\*` as wildcards. > I'd like to address a few issues in this JIRA. > 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the > replacement was scattered in several places; it is good to have one place to > do the same thing; > 2. Consistency with Hive: the pattern is case insensitive in Hive and white > spaces will be trimmed, but current pattern matching does not do that. For > example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE ' T* ' ` > will list all the t-tables. Please use Hive to verify it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
[ https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14429: Description: LIKE is commonly used in SHOW TABLES / FUNCTIONS etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA. 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the replacement was scattered in several places; it is good to have one place to do the same thing; 2. Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE ' T* ' ` will list all the t-tables. Please use Hive to verify it. was: LIKE is commonly used in SHOW TABLES / FUNCTIONS etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA. 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the replacement was scattered in several places; it is good to have one place to do the same thing; 2. Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE ' T* ' ` will list all the t-tables. Please use Hive to verify it. I will post a PR shortly. > Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL > > > Key: SPARK-14429 > URL: https://issues.apache.org/jira/browse/SPARK-14429 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng >Priority: Minor > > LIKE is commonly used in SHOW TABLES / FUNCTIONS etc DDL. In the > pattern, user can use `|` or `\*` as wildcards. > I'd like to address a few issues in this JIRA. > 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the > replacement was scattered in several places; it is good to have one place to > do the same thing; > 2. Consistency with Hive: the pattern is case insensitive in Hive and white > spaces will be trimmed, but current pattern matching does not do that. For > example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE ' T* ' ` > will list all the t-tables. Please use Hive to verify it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
[ https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14429: Summary: Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL (was: Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE ") > Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL > > > Key: SPARK-14429 > URL: https://issues.apache.org/jira/browse/SPARK-14429 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng >Priority: Minor > > LIKE is commonly used in SHOW TABLES / FUNCTIONS etc DDL. In the > pattern, user can use `|` or `\*` as wildcards. > I'd like to address a few issues in this JIRA. > 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the > replacement was scattered in several places; it is good to have one place to > do the same thing; > 2. Consistency with Hive: the pattern is case insensitive in Hive and white > spaces will be trimmed, but current pattern matching does not do that. For > example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE ' T* ' ` > will list all the t-tables. Please use Hive to verify it. > I will post a PR shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE "
Bo Meng created SPARK-14429: --- Summary: Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " Key: SPARK-14429 URL: https://issues.apache.org/jira/browse/SPARK-14429 Project: Spark Issue Type: Bug Components: SQL Reporter: Bo Meng Priority: Minor LIKE is commonly used in SHOW TABLES / FUNCTIONS etc DDL. In the pattern, user can use `|` or `\*` as wildcards. I'd like to address a few issues in this JIRA. 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the replacement was scattered in several places; it is good to have one place to do the same thing; 2. Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE ' T* ' ` will list all the t-tables. Please use Hive to verify it. I will post a PR shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14398) Audit non-reserved keyword list in ANTLR4 parser.
[ https://issues.apache.org/jira/browse/SPARK-14398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225767#comment-15225767 ] Bo Meng commented on SPARK-14398: - Not a problem. I will work on it tomorrow. Thanks. > Audit non-reserved keyword list in ANTLR4 parser. > - > > Key: SPARK-14398 > URL: https://issues.apache.org/jira/browse/SPARK-14398 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Herman van Hovell > Fix For: 2.0.0 > > > We need to check if all keywords that were non-reserved in the `old` ANTLR3 > parser are non-reserved in the ANTLR4 parser. Notable exceptions are join > {{LEFT}}, {{RIGHT}} & {{FULL}} keywords; these used to be non-reserved and > are now. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14383) Missing "|" in the g4 definition
[ https://issues.apache.org/jira/browse/SPARK-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14383: Summary: Missing "|" in the g4 definition (was: Missing | in the g4 definition) > Missing "|" in the g4 definition > > > Key: SPARK-14383 > URL: https://issues.apache.org/jira/browse/SPARK-14383 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng >Priority: Trivial > > It is really a trivial bug in the g4 file I found. It is missing a "|" > between DISTRIBUTE and UNSET. I will post the PR shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14383) Missing | in the g4 definition
[ https://issues.apache.org/jira/browse/SPARK-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14383: Description: It is really a trivial bug in the g4 file I found. It is missing a "|" between DISTRIBUTE and UNSET. I will post the PR shortly. (was: It is really a trivial bug in the g4 file I found. I will post the PR shortly. ) > Missing | in the g4 definition > -- > > Key: SPARK-14383 > URL: https://issues.apache.org/jira/browse/SPARK-14383 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng >Priority: Trivial > > It is really a trivial bug in the g4 file I found. It is missing a "|" > between DISTRIBUTE and UNSET. I will post the PR shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14383) Missing | in the g4 definition
[ https://issues.apache.org/jira/browse/SPARK-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14383: Description: It is really a trivial bug in the g4 file I found. I will post the PR shortly. > Missing | in the g4 definition > -- > > Key: SPARK-14383 > URL: https://issues.apache.org/jira/browse/SPARK-14383 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng >Priority: Trivial > > It is really a trivial bug in the g4 file I found. I will post the PR > shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14383) Missing | in the g4 definition
Bo Meng created SPARK-14383: --- Summary: Missing | in the g4 definition Key: SPARK-14383 URL: https://issues.apache.org/jira/browse/SPARK-14383 Project: Spark Issue Type: Bug Components: SQL Reporter: Bo Meng Priority: Trivial -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-14323) [SQL] SHOW FUNCTIONS did not work properly
[ https://issues.apache.org/jira/browse/SPARK-14323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng closed SPARK-14323. --- Resolution: Won't Fix > [SQL] SHOW FUNCTIONS did not work properly > -- > > Key: SPARK-14323 > URL: https://issues.apache.org/jira/browse/SPARK-14323 > Project: Spark > Issue Type: Bug >Reporter: Bo Meng >Priority: Minor > > Show Functions syntax can be found here: > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions > When use "*" in the LIKE clause, it will not return the expected results. > This is because "\*" did not get escaped before passing to the regex. If we > do not escape "\*", for example, pattern "\*f\*", it will cause exception > (PatternSyntaxException, Dangling meta character) and thus return empty > result. > try this: > val p = "\*f\*".r -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14323) [SQL] SHOW FUNCTIONS did not work properly
[ https://issues.apache.org/jira/browse/SPARK-14323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225083#comment-15225083 ] Bo Meng edited comment on SPARK-14323 at 4/4/16 9:26 PM: - I did a deep investigation of pattern matching for LIKE in show tables/functions. Here is what I found: Hive only supports \* and | as wildcards. Use ".\*" to replace "\*" is right. The only issue is ShowFunctions() in commands.scala currently did not use it, thus cause the test cases fail. By using listFunctions() in SessionCatalog, the problem should be resolved. That will be happened in another SPARK-14123. was (Author: bomeng): I did a deep investigation of pattern matching for LIKE in show tables/functions. Here is what I found: Hive only supports * and | as wildcards. Use ".*" to replace "*" is right. The only issue is ShowFunctions() in commands.scala currently did not use it, thus cause the test cases fail. By using listFunctions() in SessionCatalog, the problem should be resolved. That will be happened in another SPARK-14123. > [SQL] SHOW FUNCTIONS did not work properly > -- > > Key: SPARK-14323 > URL: https://issues.apache.org/jira/browse/SPARK-14323 > Project: Spark > Issue Type: Bug >Reporter: Bo Meng >Priority: Minor > > Show Functions syntax can be found here: > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions > When use "*" in the LIKE clause, it will not return the expected results. > This is because "\*" did not get escaped before passing to the regex. If we > do not escape "\*", for example, pattern "\*f\*", it will cause exception > (PatternSyntaxException, Dangling meta character) and thus return empty > result. > try this: > val p = "\*f\*".r -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14323) [SQL] SHOW FUNCTIONS did not work properly
[ https://issues.apache.org/jira/browse/SPARK-14323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225083#comment-15225083 ] Bo Meng commented on SPARK-14323: - I did a deep investigation of pattern matching for LIKE in show tables/functions. Here is what I found: Hive only supports * and | as wildcards. Use ".*" to replace "*" is right. The only issue is ShowFunctions() in commands.scala currently did not use it, thus cause the test cases fail. By using listFunctions() in SessionCatalog, the problem should be resolved. That will be happened in another SPARK-14123. > [SQL] SHOW FUNCTIONS did not work properly > -- > > Key: SPARK-14323 > URL: https://issues.apache.org/jira/browse/SPARK-14323 > Project: Spark > Issue Type: Bug >Reporter: Bo Meng >Priority: Minor > > Show Functions syntax can be found here: > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions > When use "*" in the LIKE clause, it will not return the expected results. > This is because "\*" did not get escaped before passing to the regex. If we > do not escape "\*", for example, pattern "\*f\*", it will cause exception > (PatternSyntaxException, Dangling meta character) and thus return empty > result. > try this: > val p = "\*f\*".r -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14283) Avoid sort in randomSplit when possible
[ https://issues.apache.org/jira/browse/SPARK-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223637#comment-15223637 ] Bo Meng commented on SPARK-14283: - Could you please provide more details, such as test cases, use cases, etc.? > Avoid sort in randomSplit when possible > --- > > Key: SPARK-14283 > URL: https://issues.apache.org/jira/browse/SPARK-14283 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Joseph K. Bradley > > Dataset.randomSplit sorts each partition in order to guarantee an ordering > and make randomSplit deterministic given the seed. Since randomSplit is used > a fair amount in ML, it would be great to avoid the sort when possible. > Are there cases when it could be avoided? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14341) Throw exception on unsupported Create/Drop Macro DDL commands
Bo Meng created SPARK-14341: --- Summary: Throw exception on unsupported Create/Drop Macro DDL commands Key: SPARK-14341 URL: https://issues.apache.org/jira/browse/SPARK-14341 Project: Spark Issue Type: Improvement Reporter: Bo Meng Priority: Minor According to [SPARK-14123|https://issues.apache.org/jira/browse/SPARK-14123], we need to throw exception for Create/Drop Macro DDL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14323) [SQL] SHOW FUNCTIONS did not work properly
[ https://issues.apache.org/jira/browse/SPARK-14323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14323: Description: Show Functions syntax can be found here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions When use "*" in the LIKE clause, it will not return the expected results. This is because "*" did not get escaped before passing to the regex. If we do not escape "*", for example, pattern "*f*", it will cause exception (PatternSyntaxException, Dangling meta character) and thus return empty result. try this: val p = "\*f\*".r was: Show Functions syntax can be found here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions When use "*" in the LIKE clause, it will not return the expected results. This is because "*" did not get escaped before passing to the regex. If we do not escape "*", for example, pattern "*f*", it will cause exception (PatternSyntaxException, Dangling meta character) and thus return empty result. try this: val p = "*f*".r > [SQL] SHOW FUNCTIONS did not work properly > -- > > Key: SPARK-14323 > URL: https://issues.apache.org/jira/browse/SPARK-14323 > Project: Spark > Issue Type: Bug >Reporter: Bo Meng >Priority: Minor > > Show Functions syntax can be found here: > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions > When use "*" in the LIKE clause, it will not return the expected results. > This is because "*" did not get escaped before passing to the regex. If we do > not escape "*", for example, pattern "*f*", it will cause exception > (PatternSyntaxException, Dangling meta character) and thus return empty > result. > try this: > val p = "\*f\*".r -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14323) [SQL] SHOW FUNCTIONS did not work properly
[ https://issues.apache.org/jira/browse/SPARK-14323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14323: Description: Show Functions syntax can be found here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions When use "*" in the LIKE clause, it will not return the expected results. This is because "*" did not get escaped before passing to the regex. If we do not escape "*", for example, pattern "*f*", it will cause exception (PatternSyntaxException, Dangling meta character) and thus return empty result. try this: val p = "*f*".r was: Show Functions syntax can be found here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions When use "*" in the LIKE clause, it will not return the expected results. This is because "*" did not get escaped before passing to the regex. > [SQL] SHOW FUNCTIONS did not work properly > -- > > Key: SPARK-14323 > URL: https://issues.apache.org/jira/browse/SPARK-14323 > Project: Spark > Issue Type: Bug >Reporter: Bo Meng >Priority: Minor > > Show Functions syntax can be found here: > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions > When use "*" in the LIKE clause, it will not return the expected results. > This is because "*" did not get escaped before passing to the regex. If we do > not escape "*", for example, pattern "*f*", it will cause exception > (PatternSyntaxException, Dangling meta character) and thus return empty > result. > try this: > val p = "*f*".r -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14323) [SQL] SHOW FUNCTIONS did not work properly
[ https://issues.apache.org/jira/browse/SPARK-14323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14323: Description: Show Functions syntax can be found here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions When use "*" in the LIKE clause, it will not return the expected results. This is because "\*" did not get escaped before passing to the regex. If we do not escape "\*", for example, pattern "\*f\*", it will cause exception (PatternSyntaxException, Dangling meta character) and thus return empty result. try this: val p = "\*f\*".r was: Show Functions syntax can be found here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions When use "*" in the LIKE clause, it will not return the expected results. This is because "*" did not get escaped before passing to the regex. If we do not escape "*", for example, pattern "*f*", it will cause exception (PatternSyntaxException, Dangling meta character) and thus return empty result. try this: val p = "\*f\*".r > [SQL] SHOW FUNCTIONS did not work properly > -- > > Key: SPARK-14323 > URL: https://issues.apache.org/jira/browse/SPARK-14323 > Project: Spark > Issue Type: Bug >Reporter: Bo Meng >Priority: Minor > > Show Functions syntax can be found here: > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions > When use "*" in the LIKE clause, it will not return the expected results. > This is because "\*" did not get escaped before passing to the regex. If we > do not escape "\*", for example, pattern "\*f\*", it will cause exception > (PatternSyntaxException, Dangling meta character) and thus return empty > result. > try this: > val p = "\*f\*".r -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14323) [SQL] SHOW FUNCTIONS did not work properly
Bo Meng created SPARK-14323: --- Summary: [SQL] SHOW FUNCTIONS did not work properly Key: SPARK-14323 URL: https://issues.apache.org/jira/browse/SPARK-14323 Project: Spark Issue Type: Bug Reporter: Bo Meng Priority: Minor Show Functions syntax can be found here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions When use "*" in the LIKE clause, it will not return the expected results. This is because "*" did not get escaped before passing to the regex. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14294) Support native execution of ALTER TABLE ... RENAME TO
[ https://issues.apache.org/jira/browse/SPARK-14294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14294: Description: Support native execution of ALTER TABLE ... RENAME TO The syntax for ALTER TABLE ... RENAME TO commands is described as following: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RenameTable was: Support native execution of ALTER TABLE ... RENAME TO The syntax for ALTER TABLE ... RENAME TO commands are described as following: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RenameTable > Support native execution of ALTER TABLE ... RENAME TO > - > > Key: SPARK-14294 > URL: https://issues.apache.org/jira/browse/SPARK-14294 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Bo Meng >Priority: Minor > > Support native execution of ALTER TABLE ... RENAME TO > The syntax for ALTER TABLE ... RENAME TO commands is described as following: > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RenameTable -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14129) [Table related commands] Alter table
[ https://issues.apache.org/jira/browse/SPARK-14129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219883#comment-15219883 ] Bo Meng commented on SPARK-14129: - This is for ALTER TABLE...RENAME TO.. I will be working on rest of them. > [Table related commands] Alter table > > > Key: SPARK-14129 > URL: https://issues.apache.org/jira/browse/SPARK-14129 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai > > For alter table command, we have the following tokens. > TOK_ALTERTABLE_RENAME > TOK_ALTERTABLE_LOCATION > TOK_ALTERTABLE_PROPERTIES/TOK_ALTERTABLE_DROPPROPERTIES > TOK_ALTERTABLE_SERIALIZER > TOK_ALTERTABLE_SERDEPROPERTIES > TOK_ALTERTABLE_CLUSTER_SORT > TOK_ALTERTABLE_SKEWED > For a data source table, let's implement TOK_ALTERTABLE_RENAME, > TOK_ALTERTABLE_LOCATION, and TOK_ALTERTABLE_SERDEPROPERTIES. We need to > decide what we do for > TOK_ALTERTABLE_PROPERTIES/TOK_ALTERTABLE_DROPPROPERTIES. It will be use to > allow users to correct the data format (e.g. changing csv to > com.databricks.spark.csv to allow the table be accessed by the older versions > of spark). > For a Hive table, we should implement all commands supported by the data > source table and TOK_ALTERTABLE_PROPERTIES/TOK_ALTERTABLE_DROPPROPERTIES. > For TOK_ALTERTABLE_CLUSTER_SORT and TOK_ALTERTABLE_SKEWED, we should throw > exceptions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14294) Support native execution of ALTER TABLE ... RENAME TO
Bo Meng created SPARK-14294: --- Summary: Support native execution of ALTER TABLE ... RENAME TO Key: SPARK-14294 URL: https://issues.apache.org/jira/browse/SPARK-14294 Project: Spark Issue Type: Improvement Components: SQL Reporter: Bo Meng Priority: Minor Support native execution of ALTER TABLE ... RENAME TO The syntax for ALTER TABLE ... RENAME TO commands are described as following: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RenameTable -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org