from:"Bo Meng \(JIRA\)"

[jira] [Commented] (SPARK-25078) Standalone does not work with spark.authenticate.secret and deploy-mode=cluster

2018-08-15 Thread Bo Meng (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581781#comment-16581781
 ] 

Bo Meng commented on SPARK-25078:
-

what is your suggestion to improve this?

> Standalone does not work with spark.authenticate.secret and 
> deploy-mode=cluster
> ---
>
> Key: SPARK-25078
> URL: https://issues.apache.org/jira/browse/SPARK-25078
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>
> When running a spark standalone cluster with spark.authenticate.secret setup, 
> you cannot submit a program in cluster mode, even with the right secret.  The 
> driver fails with:
> {noformat}
> 18/08/09 08:17:21 INFO SecurityManager: SecurityManager: authentication 
> enabled; ui acls disabled; users  with view permissions: Set(systest); groups 
> with view permissions: Set(); users  with modify permissions: Set(systest); 
> groups with modify permissions: Set()
> 18/08/09 08:17:21 ERROR SparkContext: Error initializing SparkContext.
> java.lang.IllegalArgumentException: requirement failed: A secret key must be 
> specified via the spark.authenticate.secret config.
> at scala.Predef$.require(Predef.scala:224)
> at 
> org.apache.spark.SecurityManager.initializeAuth(SecurityManager.scala:361)
> at org.apache.spark.SparkEnv$.create(SparkEnv.scala:238)
> at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:175)
> at 
> org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:257)
> at org.apache.spark.SparkContext.(SparkContext.scala:424)
> ...
> {noformat}
> but its actually doing the wrong check in 
> {{SecurityManager.initializeAuth()}}.  The secret is there, its just in an 
> environment variable {{_SPARK_AUTH_SECRET}} (so its not visible to another 
> process).
> *Workaround*: In your program, you can pass in a dummy secret to your spark 
> conf.  It doesn't matter what it is at all, later it'll be ignored and when 
> establishing connections, the secret from the env variable will be used.  Eg.
> {noformat}
> val conf = new SparkConf()
> conf.setIfMissing("spark.authenticate.secret", "doesn't matter")
> val sc = new SparkContext(conf)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22357) SparkContext.binaryFiles ignore minPartitions parameter

2017-10-26 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221078#comment-16221078
 ] 

Bo Meng commented on SPARK-22357:
-

a quick fix could be as follows, correct me if i am wrong.
val defaultParallelism = Math.max(sc.defaultParallelism, minPartitions)

> SparkContext.binaryFiles ignore minPartitions parameter
> ---
>
> Key: SPARK-22357
> URL: https://issues.apache.org/jira/browse/SPARK-22357
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.2, 2.2.0
>Reporter: Weichen Xu
>
> this is a bug in binaryFiles - even though we give it the partitions, 
> binaryFiles ignores it.
> This is a bug introduced in spark 2.1 from spark 2.0, in file 
> PortableDataStream.scala the argument “minPartitions” is no longer used (with 
> the push to master on 11/7/6):
> {code}
> /**
> Allow minPartitions set by end-user in order to keep compatibility with old 
> Hadoop API
> which is set through setMaxSplitSize
> */
> def setMinPartitions(sc: SparkContext, context: JobContext, minPartitions: 
> Int) {
> val defaultMaxSplitBytes = 
> sc.getConf.get(config.FILES_MAX_PARTITION_BYTES)
> val openCostInBytes = sc.getConf.get(config.FILES_OPEN_COST_IN_BYTES)
> val defaultParallelism = sc.defaultParallelism
> val files = listStatus(context).asScala
> val totalBytes = files.filterNot(.isDirectory).map(.getLen + 
> openCostInBytes).sum
> val bytesPerCore = totalBytes / defaultParallelism
> val maxSplitSize = Math.min(defaultMaxSplitBytes, 
> Math.max(openCostInBytes, bytesPerCore))
> super.setMaxSplitSize(maxSplitSize)
> }
> {code}
> The code previously, in version 2.0, was:
> {code}
> def setMinPartitions(context: JobContext, minPartitions: Int) {
> val totalLen = 
> listStatus(context).asScala.filterNot(.isDirectory).map(.getLen).sum
> val maxSplitSize = math.ceil(totalLen / math.max(minPartitions, 
> 1.0)).toLong
> super.setMaxSplitSize(maxSplitSize)
> }
> {code}
> The new code is very smart, but it ignores what the user passes in and uses 
> the data size, which is kind of a breaking change in some sense
> In our specific case this was a problem, because we initially read in just 
> the files names and only after that the dataframe becomes very large, when 
> reading in the images themselves – and in this case the new code does not 
> handle the partitioning very well.
> I’m not sure if it can be easily fixed because I don’t understand the full 
> context of the change in spark (but at the very least the unused parameter 
> should be removed to avoid confusion).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20145) "SELECT * FROM range(1)" works, but "SELECT * FROM RANGE(1)" doesn't

2017-03-30 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949588#comment-15949588
 ] 

Bo Meng commented on SPARK-20145:
-

You do not need to be assigned. Just go ahead provide your solution in PR. 

> "SELECT * FROM range(1)" works, but "SELECT * FROM RANGE(1)" doesn't
> 
>
> Key: SPARK-20145
> URL: https://issues.apache.org/jira/browse/SPARK-20145
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Juliusz Sompolski
>
> Executed at clean tip of the master branch, with all default settings:
> scala> spark.sql("SELECT * FROM range(1)")
> res1: org.apache.spark.sql.DataFrame = [id: bigint]
> scala> spark.sql("SELECT * FROM RANGE(1)")
> org.apache.spark.sql.AnalysisException: could not resolve `RANGE` to a 
> table-valued function; line 1 pos 14
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.ResolveTableValuedFunctions$$anonfun$apply$1.applyOrElse(ResolveTableValuedFunctions.scala:126)
>   at 
> org.apache.spark.sql.catalyst.analysis.ResolveTableValuedFunctions$$anonfun$apply$1.applyOrElse(ResolveTableValuedFunctions.scala:106)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62)
> ...
> I believe it should be case insensitive?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20145) "SELECT * FROM range(1)" works, but "SELECT * FROM RANGE(1)" doesn't

2017-03-29 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947726#comment-15947726
 ] 

Bo Meng commented on SPARK-20145:
-

>From the current code, I can see builtinFunctions is using the exact match for 
>looking up ("range" as a key is all lowercase). 

> "SELECT * FROM range(1)" works, but "SELECT * FROM RANGE(1)" doesn't
> 
>
> Key: SPARK-20145
> URL: https://issues.apache.org/jira/browse/SPARK-20145
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Juliusz Sompolski
>
> Executed at clean tip of the master branch, with all default settings:
> scala> spark.sql("SELECT * FROM range(1)")
> res1: org.apache.spark.sql.DataFrame = [id: bigint]
> scala> spark.sql("SELECT * FROM RANGE(1)")
> org.apache.spark.sql.AnalysisException: could not resolve `RANGE` to a 
> table-valued function; line 1 pos 14
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.ResolveTableValuedFunctions$$anonfun$apply$1.applyOrElse(ResolveTableValuedFunctions.scala:126)
>   at 
> org.apache.spark.sql.catalyst.analysis.ResolveTableValuedFunctions$$anonfun$apply$1.applyOrElse(ResolveTableValuedFunctions.scala:106)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62)
> ...
> I believe it should be case insensitive?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-20146) Column comment information is missing for Thrift Server's TableSchema

2017-03-29 Thread Bo Meng (JIRA)

Bo Meng created SPARK-20146:
---

 Summary: Column comment information is missing for Thrift Server's 
TableSchema
 Key: SPARK-20146
 URL: https://issues.apache.org/jira/browse/SPARK-20146
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.2.0
Reporter: Bo Meng
Priority: Minor


I found this issue while doing some tests against Thrift Server.

The column comments information were missing while querying the TableSchema. 
Currently, all the comments were ignored.

I will post a fix shortly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-20004) Spark thrift server ovewrites spark.app.name

2017-03-21 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935472#comment-15935472
 ] 

Bo Meng edited comment on SPARK-20004 at 3/21/17 10:32 PM:
---

I think you can still use --name for your app name. for example, 
/spark/sbin/start-thriftserver.sh --name="My server 1"


was (Author: bomeng):
I think you can still use --name for your app name. for example, 
/spark/sbin/start-thriftserver.sh --conf spark.yarn.queue=spark.client.$host 
--conf spark.app.name="ODBC server $host"

> Spark thrift server ovewrites spark.app.name
> 
>
> Key: SPARK-20004
> URL: https://issues.apache.org/jira/browse/SPARK-20004
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Egor Pahomov
>Priority: Minor
>
> {code}
> export SPARK_YARN_APP_NAME="ODBC server $host"
> /spark/sbin/start-thriftserver.sh --conf spark.yarn.queue=spark.client.$host 
> --conf spark.app.name="ODBC server $host"
> {code}
> And spark-defaults.conf contains: 
> {code}
> spark.app.name "ODBC server spark01"
> {code}
> Still name in yarn is "Thrift JDBC/ODBC Server"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20004) Spark thrift server ovewrites spark.app.name

2017-03-21 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935472#comment-15935472
 ] 

Bo Meng commented on SPARK-20004:
-

I think you can still use --name for your app name. for example, 
/spark/sbin/start-thriftserver.sh --conf spark.yarn.queue=spark.client.$host 
--conf spark.app.name="ODBC server $host"

> Spark thrift server ovewrites spark.app.name
> 
>
> Key: SPARK-20004
> URL: https://issues.apache.org/jira/browse/SPARK-20004
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Egor Pahomov
>Priority: Minor
>
> {code}
> export SPARK_YARN_APP_NAME="ODBC server $host"
> /spark/sbin/start-thriftserver.sh --conf spark.yarn.queue=spark.client.$host 
> --conf spark.app.name="ODBC server $host"
> {code}
> And spark-defaults.conf contains: 
> {code}
> spark.app.name "ODBC server spark01"
> {code}
> Still name in yarn is "Thrift JDBC/ODBC Server"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-16173) Can't join describe() of DataFrame in Scala 2.10

2016-06-23 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347255#comment-15347255
 ] 

Bo Meng edited comment on SPARK-16173 at 6/23/16 9:58 PM:
--

Use the latest master, I was not able to reproduce, here is my code:
{quote}
val a = Seq(("Alice", 1)).toDF("name", "age").describe()
val b = Seq(("Bob", 2)).toDF("name", "grade").describe()

a.show()
b.show()

a.join(b, Seq("summary")).show()
{quote}
Anything I am missing? I am using Scala 2.11, does it only happen to Scala 
2.10? 


was (Author: bomeng):
Use the latest master, I was not able to reproduce, here is my code:
{quote}
val a = Seq(("Alice", 1)).toDF("name", "age").describe()
val b = Seq(("Bob", 2)).toDF("name", "grade").describe()

a.show()
b.show()

a.join(b, Seq("summary")).show()
{quote}
Anything I am missing?

> Can't join describe() of DataFrame in Scala 2.10
> 
>
> Key: SPARK-16173
> URL: https://issues.apache.org/jira/browse/SPARK-16173
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.1, 2.0.0
>Reporter: Davies Liu
>
> descripbe() of DataFrame use Seq() (it's a Iterator actually) to create 
> another DataFrame, which can not be serialized in Scala 2.10.
> {code}
> org.apache.spark.SparkException: Task not serializable
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
>   at 
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
>   at org.apache.spark.SparkContext.clean(SparkContext.scala:2060)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>   at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706)
>   at 
> org.apache.spark.sql.execution.ConvertToUnsafe.doExecute(rowFormatConverters.scala:38)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashJoin.scala:82)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashJoin.scala:79)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:100)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashJoin.scala:79)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashJoin.scala:79)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.NotSerializableException: 
> scala.collection.Iterator$$anon$11
> Serialization stack:
>   - object not serializable (class: scala.collection.Iterator$$anon$11, 
> value: empty iterator)
>   - field (class: scala.collection.Iterator$$anonfun$toStream$1, name: 
> $outer, type: interface scala.collection.Iterator)
>   - object (class scala.collection.Iterator$$anonfun$toStream$1, 
> )
>   - field (class: scala.collection.immutable.Stream$Cons, name: tl, type: 
> interface scala.Function0)
>   - object (class scala.collection.immutable.Stream$Cons, 
> Stream(WrappedArray(1), WrappedArray(2.0), WrappedArray(NaN), 
> WrappedArray(2), WrappedArray(2)))
>   - field (class: scala.collection.immutable.Stream$$anonfun$zip$1, name: 
> $outer, type: class scala.collection.immutable.Stream)
>   - object (class scala.collection.immutable.Stream$$anonfun$zip$1, 
> )
>   - field (class: scala.collection.immutable.Stream$Cons, name: tl, type: 
> interface scala.Function0)
>   - object (class

[jira] [Updated] (SPARK-16004) Improve CatalogTable information

2016-06-23 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-16004:

Description: 
A few issues found when running "describe extended | formatted [tableName]" 
command:
1. The last access time is incorrectly displayed something like "Last Access 
Time:   |Wed Dec 31 15:59:59 PST 1969", I think we should display as 
"UNKNOWN" as Hive does;
2. Comments fields display "null" instead of empty string when commend is None;

I will make a PR shortly.

  was:
A few issues found when running "describe extended | formatted [tableName]" 
command:
1. The last access time is incorrectly displayed something like "Last Access 
Time:   |Wed Dec 31 15:59:59 PST 1969", I think we should display as 
"UNKNOWN" as Hive does;
2. Owner is always empty, instead of the current login user, who creates the 
table;
3. Comments fields display "null" instead of empty string when commend is None;

I will make a PR shortly.


> Improve CatalogTable information
> 
>
> Key: SPARK-16004
> URL: https://issues.apache.org/jira/browse/SPARK-16004
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Bo Meng
>
> A few issues found when running "describe extended | formatted [tableName]" 
> command:
> 1. The last access time is incorrectly displayed something like "Last Access 
> Time:   |Wed Dec 31 15:59:59 PST 1969", I think we should display as 
> "UNKNOWN" as Hive does;
> 2. Comments fields display "null" instead of empty string when commend is 
> None;
> I will make a PR shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-16173) Can't join describe() of DataFrame in Scala 2.10

2016-06-23 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347255#comment-15347255
 ] 

Bo Meng edited comment on SPARK-16173 at 6/23/16 9:38 PM:
--

Use the latest master, I was not able to reproduce, here is my code:
{quote}
val a = Seq(("Alice", 1)).toDF("name", "age").describe()
val b = Seq(("Bob", 2)).toDF("name", "grade").describe()

a.show()
b.show()

a.join(b, Seq("summary")).show()
{quote}
Anything I am missing?


was (Author: bomeng):
Use the latest master, I was not able to reproduce, here is my code:
{{
val a = Seq(("Alice", 1)).toDF("name", "age").describe()
val b = Seq(("Bob", 2)).toDF("name", "grade").describe()

a.show()
b.show()

a.join(b, Seq("summary")).show()
}}
Anything I am missing?

> Can't join describe() of DataFrame in Scala 2.10
> 
>
> Key: SPARK-16173
> URL: https://issues.apache.org/jira/browse/SPARK-16173
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.1, 2.0.0
>Reporter: Davies Liu
>
> descripbe() of DataFrame use Seq() (it's a Iterator actually) to create 
> another DataFrame, which can not be serialized in Scala 2.10.
> {code}
> org.apache.spark.SparkException: Task not serializable
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
>   at 
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
>   at org.apache.spark.SparkContext.clean(SparkContext.scala:2060)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>   at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706)
>   at 
> org.apache.spark.sql.execution.ConvertToUnsafe.doExecute(rowFormatConverters.scala:38)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashJoin.scala:82)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashJoin.scala:79)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:100)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashJoin.scala:79)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashJoin.scala:79)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.NotSerializableException: 
> scala.collection.Iterator$$anon$11
> Serialization stack:
>   - object not serializable (class: scala.collection.Iterator$$anon$11, 
> value: empty iterator)
>   - field (class: scala.collection.Iterator$$anonfun$toStream$1, name: 
> $outer, type: interface scala.collection.Iterator)
>   - object (class scala.collection.Iterator$$anonfun$toStream$1, 
> )
>   - field (class: scala.collection.immutable.Stream$Cons, name: tl, type: 
> interface scala.Function0)
>   - object (class scala.collection.immutable.Stream$Cons, 
> Stream(WrappedArray(1), WrappedArray(2.0), WrappedArray(NaN), 
> WrappedArray(2), WrappedArray(2)))
>   - field (class: scala.collection.immutable.Stream$$anonfun$zip$1, name: 
> $outer, type: class scala.collection.immutable.Stream)
>   - object (class scala.collection.immutable.Stream$$anonfun$zip$1, 
> )
>   - field (class: scala.collection.immutable.Stream$Cons, name: tl, type: 
> interface scala.Function0)
>   - object (class scala.collection.immutable.Stream$Cons, 
> Stream((WrappedArray(1),(count,)),

[jira] [Comment Edited] (SPARK-16173) Can't join describe() of DataFrame in Scala 2.10

2016-06-23 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347255#comment-15347255
 ] 

Bo Meng edited comment on SPARK-16173 at 6/23/16 9:37 PM:
--

Use the latest master, I was not able to reproduce, here is my code:
{{
val a = Seq(("Alice", 1)).toDF("name", "age").describe()
val b = Seq(("Bob", 2)).toDF("name", "grade").describe()

a.show()
b.show()

a.join(b, Seq("summary")).show()
}}
Anything I am missing?


was (Author: bomeng):
Use the latest master, I was not able to reproduce, here is my code:

val a = Seq(("Alice", 1)).toDF("name", "age").describe()
val b = Seq(("Bob", 2)).toDF("name", "grade").describe()

a.show()
b.show()

a.join(b, Seq("summary")).show()

Anything I am missing?

> Can't join describe() of DataFrame in Scala 2.10
> 
>
> Key: SPARK-16173
> URL: https://issues.apache.org/jira/browse/SPARK-16173
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.1, 2.0.0
>Reporter: Davies Liu
>
> descripbe() of DataFrame use Seq() (it's a Iterator actually) to create 
> another DataFrame, which can not be serialized in Scala 2.10.
> {code}
> org.apache.spark.SparkException: Task not serializable
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
>   at 
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
>   at org.apache.spark.SparkContext.clean(SparkContext.scala:2060)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>   at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706)
>   at 
> org.apache.spark.sql.execution.ConvertToUnsafe.doExecute(rowFormatConverters.scala:38)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashJoin.scala:82)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashJoin.scala:79)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:100)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashJoin.scala:79)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashJoin.scala:79)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.NotSerializableException: 
> scala.collection.Iterator$$anon$11
> Serialization stack:
>   - object not serializable (class: scala.collection.Iterator$$anon$11, 
> value: empty iterator)
>   - field (class: scala.collection.Iterator$$anonfun$toStream$1, name: 
> $outer, type: interface scala.collection.Iterator)
>   - object (class scala.collection.Iterator$$anonfun$toStream$1, 
> )
>   - field (class: scala.collection.immutable.Stream$Cons, name: tl, type: 
> interface scala.Function0)
>   - object (class scala.collection.immutable.Stream$Cons, 
> Stream(WrappedArray(1), WrappedArray(2.0), WrappedArray(NaN), 
> WrappedArray(2), WrappedArray(2)))
>   - field (class: scala.collection.immutable.Stream$$anonfun$zip$1, name: 
> $outer, type: class scala.collection.immutable.Stream)
>   - object (class scala.collection.immutable.Stream$$anonfun$zip$1, 
> )
>   - field (class: scala.collection.immutable.Stream$Cons, name: tl, type: 
> interface scala.Function0)
>   - object (class scala.collection.immutable.Stream$Cons, 
> Stream((WrappedArray(1),(count,)), 
>

[jira] [Commented] (SPARK-16173) Can't join describe() of DataFrame in Scala 2.10

2016-06-23 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347255#comment-15347255
 ] 

Bo Meng commented on SPARK-16173:
-

Use the latest master, I was not able to reproduce, here is my code:

val a = Seq(("Alice", 1)).toDF("name", "age").describe()
val b = Seq(("Bob", 2)).toDF("name", "grade").describe()

a.show()
b.show()

a.join(b, Seq("summary")).show()

Anything I am missing?

> Can't join describe() of DataFrame in Scala 2.10
> 
>
> Key: SPARK-16173
> URL: https://issues.apache.org/jira/browse/SPARK-16173
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.1, 2.0.0
>Reporter: Davies Liu
>
> descripbe() of DataFrame use Seq() (it's a Iterator actually) to create 
> another DataFrame, which can not be serialized in Scala 2.10.
> {code}
> org.apache.spark.SparkException: Task not serializable
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
>   at 
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
>   at org.apache.spark.SparkContext.clean(SparkContext.scala:2060)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>   at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706)
>   at 
> org.apache.spark.sql.execution.ConvertToUnsafe.doExecute(rowFormatConverters.scala:38)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashJoin.scala:82)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashJoin.scala:79)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:100)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashJoin.scala:79)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashJoin.scala:79)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.NotSerializableException: 
> scala.collection.Iterator$$anon$11
> Serialization stack:
>   - object not serializable (class: scala.collection.Iterator$$anon$11, 
> value: empty iterator)
>   - field (class: scala.collection.Iterator$$anonfun$toStream$1, name: 
> $outer, type: interface scala.collection.Iterator)
>   - object (class scala.collection.Iterator$$anonfun$toStream$1, 
> )
>   - field (class: scala.collection.immutable.Stream$Cons, name: tl, type: 
> interface scala.Function0)
>   - object (class scala.collection.immutable.Stream$Cons, 
> Stream(WrappedArray(1), WrappedArray(2.0), WrappedArray(NaN), 
> WrappedArray(2), WrappedArray(2)))
>   - field (class: scala.collection.immutable.Stream$$anonfun$zip$1, name: 
> $outer, type: class scala.collection.immutable.Stream)
>   - object (class scala.collection.immutable.Stream$$anonfun$zip$1, 
> )
>   - field (class: scala.collection.immutable.Stream$Cons, name: tl, type: 
> interface scala.Function0)
>   - object (class scala.collection.immutable.Stream$Cons, 
> Stream((WrappedArray(1),(count,)), 
> (WrappedArray(2.0),(mean,)), 
> (WrappedArray(NaN),(stddev,)), 
> (WrappedArray(2),(min,)), (WrappedArray(2),(max,
>   - field (class: scala.collection.immutable.Stream$$anonfun$map$1, name: 
> $outer, type: class scala.collection.immutable.Stream)
>   - object (class scala.collection.immutable.Stream$$anonfun$map$1, 
> )
>   - field (class:

[jira] [Commented] (SPARK-15230) Back quoted column with dot in it fails when running distinct on dataframe

2016-06-23 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346981#comment-15346981
 ] 

Bo Meng commented on SPARK-15230:
-

Can anyone update the "Component/s" of this JIRA? It should belongs to "SQL" 
not "Examples". Thanks.

> Back quoted column with dot in it fails when running distinct on dataframe
> --
>
> Key: SPARK-15230
> URL: https://issues.apache.org/jira/browse/SPARK-15230
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 1.6.0
>Reporter: Barry Becker
>Assignee: Bo Meng
> Fix For: 2.0.1
>
>
> When working with a dataframe columns with .'s in them must be backquoted 
> (``) or the column name will not be found. This works for most dataframe 
> methods, but I discovered that it does not work for distinct().
> Suppose you have a dataFrame, testDf, with a DoubleType column named 
> {{pos.NoZero}}.  This statememt:
> {noformat}
> testDf.select(new Column("`pos.NoZero`")).distinct().collect().mkString(", ")
> {noformat}
> will fail with this error:
> {noformat}
> org.apache.spark.sql.AnalysisException: Cannot resolve column name 
> "pos.NoZero" among (pos.NoZero);
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:152)
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:152)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:151)
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$dropDuplicates$1$$anonfun$40.apply(DataFrame.scala:1329)
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$dropDuplicates$1$$anonfun$40.apply(DataFrame.scala:1329)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$dropDuplicates$1.apply(DataFrame.scala:1329)
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$dropDuplicates$1.apply(DataFrame.scala:1328)
>   at 
> org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$withPlan(DataFrame.scala:2165)
>   at org.apache.spark.sql.DataFrame.dropDuplicates(DataFrame.scala:1328)
>   at org.apache.spark.sql.DataFrame.dropDuplicates(DataFrame.scala:1348)
>   at org.apache.spark.sql.DataFrame.dropDuplicates(DataFrame.scala:1319)
>   at org.apache.spark.sql.DataFrame.distinct(DataFrame.scala:1612)
>   at 
> com.mineset.spark.vizagg.selection.SelectionExpressionSuite$$anonfun$40.apply$mcV$sp(SelectionExpressionSuite.scala:317)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16084) Minor javadoc issue with "Describe" table in the parser

2016-06-21 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341177#comment-15341177
 ] 

Bo Meng commented on SPARK-16084:
-

Got it !

> Minor javadoc issue with "Describe" table in the parser
> ---
>
> Key: SPARK-16084
> URL: https://issues.apache.org/jira/browse/SPARK-16084
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Bo Meng
>Priority: Trivial
>
> The comments need a minor change - we now support FORMATTED with DESCRIBE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16084) Minor javadoc issue with "Describe" table in the parser

2016-06-20 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341167#comment-15341167
 ] 

Bo Meng commented on SPARK-16084:
-

Ok! I will do next time by sending to the mailing list if it is a doc change. 

> Minor javadoc issue with "Describe" table in the parser
> ---
>
> Key: SPARK-16084
> URL: https://issues.apache.org/jira/browse/SPARK-16084
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Bo Meng
>Priority: Trivial
>
> The comments need a minor change - we now support FORMATTED with DESCRIBE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16084) Minor javadoc issue with "Describe" table in the parser

2016-06-20 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341141#comment-15341141
 ] 

Bo Meng commented on SPARK-16084:
-

[~sowen] Please let me know how to handle it next time if I am not creating a 
JIRA. Thanks.

> Minor javadoc issue with "Describe" table in the parser
> ---
>
> Key: SPARK-16084
> URL: https://issues.apache.org/jira/browse/SPARK-16084
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Bo Meng
>Priority: Trivial
>
> The comments need a minor change - we now support FORMATTED with DESCRIBE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16084) Minor javadoc issue with "Describe" table in the parser

2016-06-20 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-16084:

Summary: Minor javadoc issue with "Describe" table in the parser  (was: 
Minor issue with "Describe" table in the parser)

> Minor javadoc issue with "Describe" table in the parser
> ---
>
> Key: SPARK-16084
> URL: https://issues.apache.org/jira/browse/SPARK-16084
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Bo Meng
>Priority: Minor
>
> The comments need a minor change - we now support FORMATTED with DESCRIBE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-16084) Minor issue with "Describe" table in the parser

2016-06-20 Thread Bo Meng (JIRA)

Bo Meng created SPARK-16084:
---

 Summary: Minor issue with "Describe" table in the parser
 Key: SPARK-16084
 URL: https://issues.apache.org/jira/browse/SPARK-16084
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Bo Meng
Priority: Minor


The comments need a minor change - we now support FORMATTED with DESCRIBE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-16004) Improve CatalogTable information

2016-06-16 Thread Bo Meng (JIRA)

Bo Meng created SPARK-16004:
---

 Summary: Improve CatalogTable information
 Key: SPARK-16004
 URL: https://issues.apache.org/jira/browse/SPARK-16004
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Bo Meng


A few issues found when running "describe extended | formatted [tableName]" 
command:
1. The last access time is incorrectly displayed something like "Last Access 
Time:   |Wed Dec 31 15:59:59 PST 1969", I think we should display as 
"UNKNOWN" as Hive does;
2. Owner is always empty, instead of the current login user, who creates the 
table;
3. Comments fields display "null" instead of empty string when commend is None;

I will make a PR shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-15978) Some improvement of "Show Tables"

2016-06-15 Thread Bo Meng (JIRA)

Bo Meng created SPARK-15978:
---

 Summary: Some improvement of "Show Tables"
 Key: SPARK-15978
 URL: https://issues.apache.org/jira/browse/SPARK-15978
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Bo Meng
Priority: Minor


I've found some minor issues in "show tables" command:
1. In the SessionCatalog.scala, listTables(db: String) method will call 
listTables(formatDatabaseName(db), "*") to list all the tables for certain db, 
but in the method listTables(db: String, pattern: String), this db name is 
formatted once more. So I think we should remove formatDatabaseName() in the 
caller.
2. I suggest to add sort to listTables(db: String) in InMemoryCatalog.scala, 
just like listDatabases().

I will make a PR shortly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-15952) "show databases" does not get sorted result

2016-06-14 Thread Bo Meng (JIRA)

Bo Meng created SPARK-15952:
---

 Summary: "show databases" does not get sorted result
 Key: SPARK-15952
 URL: https://issues.apache.org/jira/browse/SPARK-15952
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Bo Meng


Two issues I've found for "show databases" commands:
1. The returned database name list was not sorted, it only works when "like" 
was used together; (HIVE will always return a sorted list)
2. When it is used as sql("show databases").show, it will output a table with 
column named as "result", but for sql("show tables").show, it will output the 
column name as "tableName", so I think we should be consistent and use 
"databaseName" at least.

I will make a PR shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13268) SQL Timestamp stored as GMT but toString returns GMT-08:00

2016-06-09 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323442#comment-15323442
 ] 

Bo Meng commented on SPARK-13268:
-

Why is this related to Spark? The conversion does not use any Spark function 
and I think the conversion loses the time zone information along the way.

> SQL Timestamp stored as GMT but toString returns GMT-08:00
> --
>
> Key: SPARK-13268
> URL: https://issues.apache.org/jira/browse/SPARK-13268
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Ilya Ganelin
>
> There is an issue with how timestamps are displayed/converted to Strings in 
> Spark SQL. The documentation states that the timestamp should be created in 
> the GMT time zone, however, if we do so, we see that the output actually 
> contains a -8 hour offset:
> {code}
> new 
> Timestamp(ZonedDateTime.parse("2015-01-01T00:00:00Z[GMT]").toInstant.toEpochMilli)
> res144: java.sql.Timestamp = 2014-12-31 16:00:00.0
> new 
> Timestamp(ZonedDateTime.parse("2015-01-01T00:00:00Z[GMT-08:00]").toInstant.toEpochMilli)
> res145: java.sql.Timestamp = 2015-01-01 00:00:00.0
> {code}
> This result is confusing, unintuitive, and introduces issues when converting 
> from DataFrames containing timestamps to RDDs which are then saved as text. 
> This has the effect of essentially shifting all dates in a dataset by 1 day. 
> The suggested fix for this is to update the timestamp toString representation 
> to either a) Include timezone or b) Correctly display in GMT.
> This change may well introduce substantial and insidious bugs so I'm not sure 
> how best to resolve this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15613) Incorrect days to millis conversion

2016-06-09 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323386#comment-15323386
 ] 

Bo Meng commented on SPARK-15613:
-

Does this only happen to 1.6? I have tried on the latest master and it does not 
have this issue.

> Incorrect days to millis conversion 
> 
>
> Key: SPARK-15613
> URL: https://issues.apache.org/jira/browse/SPARK-15613
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: java version "1.8.0_91"
>Reporter: Dmitry Bushev
>
> There is an issue with {{DateTimeUtils.daysToMillis}} implementation. It  
> affects {{DateTimeUtils.toJavaDate}} and ultimately CatalystTypeConverter, 
> i.e the conversion of date stored as {{Int}} days from epoch in InternalRow 
> to {{java.sql.Date}} of Row returned to user.
>  
> The issue can be reproduced with this test (all the following tests are in my 
> defalut timezone Europe/Moscow):
> {code}
> scala> for (days <- 0 to 2 if millisToDays(daysToMillis(days)) != days) 
> yield days
> res23: scala.collection.immutable.IndexedSeq[Int] = Vector(4108, 4473, 4838, 
> 5204, 5568, 5932, 6296, 6660, 7024, 7388, 8053, 8487, 8851, 9215, 9586, 9950, 
> 10314, 10678, 11042, 11406, 11777, 12141, 12505, 12869, 13233, 13597, 13968, 
> 14332, 14696, 15060)
> {code}
> For example, for {{4108}} day of epoch, the correct date should be 
> {{1981-04-01}}
> {code}
> scala> DateTimeUtils.toJavaDate(4107)
> res25: java.sql.Date = 1981-03-31
> scala> DateTimeUtils.toJavaDate(4108)
> res26: java.sql.Date = 1981-03-31
> scala> DateTimeUtils.toJavaDate(4109)
> res27: java.sql.Date = 1981-04-02
> {code}
> There was previous unsuccessful attempt to work around the problem in 
> SPARK-11415. It seems that issue involves flaws in java date implementation 
> and I don't see how it can be fixed without third-party libraries.
> I was not able to identify the library of choice for Spark. The following 
> implementation uses [JSR-310|http://www.threeten.org/]
> {code}
> def millisToDays(millisUtc: Long): SQLDate = {
>   val instant = Instant.ofEpochMilli(millisUtc)
>   val zonedDateTime = instant.atZone(ZoneId.systemDefault)
>   zonedDateTime.toLocalDate.toEpochDay.toInt
> }
> def daysToMillis(days: SQLDate): Long = {
>   val localDate = LocalDate.ofEpochDay(days)
>   val zonedDateTime = localDate.atStartOfDay(ZoneId.systemDefault)
>   zonedDateTime.toInstant.toEpochMilli
> }
> {code}
> that produces correct results:
> {code}
> scala> for (days <- 0 to 2 if millisToDays(daysToMillis(days)) != days) 
> yield days
> res37: scala.collection.immutable.IndexedSeq[Int] = Vector()
> scala> new java.sql.Date(daysToMillis(4108))
> res36: java.sql.Date = 1981-04-01
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-15613) Incorrect days to millis conversion

2016-06-09 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323386#comment-15323386
 ] 

Bo Meng edited comment on SPARK-15613 at 6/9/16 9:24 PM:
-

Does this only happen to 1.6? I have tried on the latest master and it does not 
have this issue. Have not tried on 1.6.


was (Author: bomeng):
Does this only happen to 1.6? I have tried on the latest master and it does not 
have this issue.

> Incorrect days to millis conversion 
> 
>
> Key: SPARK-15613
> URL: https://issues.apache.org/jira/browse/SPARK-15613
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: java version "1.8.0_91"
>Reporter: Dmitry Bushev
>
> There is an issue with {{DateTimeUtils.daysToMillis}} implementation. It  
> affects {{DateTimeUtils.toJavaDate}} and ultimately CatalystTypeConverter, 
> i.e the conversion of date stored as {{Int}} days from epoch in InternalRow 
> to {{java.sql.Date}} of Row returned to user.
>  
> The issue can be reproduced with this test (all the following tests are in my 
> defalut timezone Europe/Moscow):
> {code}
> scala> for (days <- 0 to 2 if millisToDays(daysToMillis(days)) != days) 
> yield days
> res23: scala.collection.immutable.IndexedSeq[Int] = Vector(4108, 4473, 4838, 
> 5204, 5568, 5932, 6296, 6660, 7024, 7388, 8053, 8487, 8851, 9215, 9586, 9950, 
> 10314, 10678, 11042, 11406, 11777, 12141, 12505, 12869, 13233, 13597, 13968, 
> 14332, 14696, 15060)
> {code}
> For example, for {{4108}} day of epoch, the correct date should be 
> {{1981-04-01}}
> {code}
> scala> DateTimeUtils.toJavaDate(4107)
> res25: java.sql.Date = 1981-03-31
> scala> DateTimeUtils.toJavaDate(4108)
> res26: java.sql.Date = 1981-03-31
> scala> DateTimeUtils.toJavaDate(4109)
> res27: java.sql.Date = 1981-04-02
> {code}
> There was previous unsuccessful attempt to work around the problem in 
> SPARK-11415. It seems that issue involves flaws in java date implementation 
> and I don't see how it can be fixed without third-party libraries.
> I was not able to identify the library of choice for Spark. The following 
> implementation uses [JSR-310|http://www.threeten.org/]
> {code}
> def millisToDays(millisUtc: Long): SQLDate = {
>   val instant = Instant.ofEpochMilli(millisUtc)
>   val zonedDateTime = instant.atZone(ZoneId.systemDefault)
>   zonedDateTime.toLocalDate.toEpochDay.toInt
> }
> def daysToMillis(days: SQLDate): Long = {
>   val localDate = LocalDate.ofEpochDay(days)
>   val zonedDateTime = localDate.atStartOfDay(ZoneId.systemDefault)
>   zonedDateTime.toInstant.toEpochMilli
> }
> {code}
> that produces correct results:
> {code}
> scala> for (days <- 0 to 2 if millisToDays(daysToMillis(days)) != days) 
> yield days
> res37: scala.collection.immutable.IndexedSeq[Int] = Vector()
> scala> new java.sql.Date(daysToMillis(4108))
> res36: java.sql.Date = 1981-04-01
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-14923) Support "Extended" in "Describe" table DDL

2016-06-08 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng closed SPARK-14923.
---
Resolution: Fixed

> Support "Extended" in "Describe" table DDL
> --
>
> Key: SPARK-14923
> URL: https://issues.apache.org/jira/browse/SPARK-14923
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Bo Meng
>
> Currently, {{Extended}} keywords in {{Describe [Extended] }} DDL 
> is simply ignored. This JIRA is to bring it back with the similar behavior as 
> Hive does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-15806) Update doc for SPARK_MASTER_IP

2016-06-07 Thread Bo Meng (JIRA)

Bo Meng created SPARK-15806:
---

 Summary: Update doc for SPARK_MASTER_IP
 Key: SPARK-15806
 URL: https://issues.apache.org/jira/browse/SPARK-15806
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Reporter: Bo Meng
Priority: Minor


SPARK_MASTER_IP is a deprecated environment variable. It is replaced by 
SPARK_MASTER_HOST according to MasterArguments.scala.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15755) java.lang.NullPointerException when run spark 2.0 setting spark.serializer=org.apache.spark.serializer.KryoSerializer

2016-06-07 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318692#comment-15318692
 ] 

Bo Meng commented on SPARK-15755:
-

Could you provide a test case to reproduce the issue?

> java.lang.NullPointerException when run spark 2.0 setting 
> spark.serializer=org.apache.spark.serializer.KryoSerializer
> -
>
> Key: SPARK-15755
> URL: https://issues.apache.org/jira/browse/SPARK-15755
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: marymwu
>
> java.lang.NullPointerException when run spark 2.0 setting 
> spark.serializer=org.apache.spark.serializer.KryoSerializer
> 16/05/27 15:15:28 ERROR TaskResultGetter: Exception while getting task result
> com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
> Serialization trace:
> underlying (org.apache.spark.util.BoundedPriorityQueue)
>   at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793)
>   at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:25)
>   at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:19)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793)
>   at 
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:312)
>   at 
> org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:87)
>   at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:66)
>   at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
>   at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793)
>   at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:56)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering.compare(GenerateOrdering.scala:157)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering.compare(GenerateOrdering.scala:148)
>   at scala.math.Ordering$$anon$4.compare(Ordering.scala:111)
>   at java.util.PriorityQueue.siftUpUsingComparator(PriorityQueue.java:649)
>   at java.util.PriorityQueue.siftUp(PriorityQueue.java:627)
>   at java.util.PriorityQueue.offer(PriorityQueue.java:329)
>   at java.util.PriorityQueue.add(PriorityQueue.java:306)
>   at 
> com.twitter.chill.java.PriorityQueueSerializer.read(PriorityQueueSerializer.java:78)
>   at 
> com.twitter.chill.java.PriorityQueueSerializer.read(PriorityQueueSerializer.java:31)
>   at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:711)
>   at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>   ... 15 more
> 16/05/27 15:15:28 ERROR TaskResultGetter: Exception while getting task result
> com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
> Serialization trace:
> underlying (org.apache.spark.util.BoundedPriorityQueue)
>   at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793)
>   at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:25)
>   at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:19)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793)
>   at 
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:312)
>   at 
> org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:87)
>   at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:66)
>   at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
>   at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793)
>   at 
>

[jira] [Commented] (SPARK-15732) Dataset generated code "generated.java" Fails with Certain Case Classes

2016-06-02 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313230#comment-15313230
 ] 

Bo Meng commented on SPARK-15732:
-

There is no easy way to work around this issue since "abstract" is a keyword in 
Java and Java does not allow keyword as identifier. Try to rename it is the 
best way I can think of. 

> Dataset generated code "generated.java" Fails with Certain Case Classes
> ---
>
> Key: SPARK-15732
> URL: https://issues.apache.org/jira/browse/SPARK-15732
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Version 2.0 Preview on the Databricks Community Edition
>Reporter: Sanjay Dasgupta
>Priority: Critical
>
> The Dataset code generation logic fails to handle field-names in case classes 
> that are also Java keywords (e.g. "abstract"). Scala has an escaping 
> mechanism (using backquotes) that allows Java (and Scala) keywords to be used 
> as names in programs, as in the example below:
> case class PatApp(number: Int, title: String, `abstract`: String)
> But this case class trips up the Dataset code generator. The following error 
> message is displayed when Datasets containing instances of such case classes 
> are processed.
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in 
> stage 54.0 failed 1 times, most recent failure: Lost task 2.0 in stage 54.0 
> (TID 1304, localhost): java.lang.RuntimeException: Error while encoding: 
> java.util.concurrent.ExecutionException: java.lang.Exception: failed to 
> compile: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 60, Column 84: Unexpected selector 'abstract' after "."
> The following code can be used to replicate the problem. This code was run on 
> the Databricks CE, in a Scala notebook, in 3 separate cells as shown below:
> // CELL 1:
> //
> // Create a Case Class with "abstract" as a field-name ...
> //
> package keywordissue
> // The field-name abstract is a Java keyword ...
> case class PatApp(number: Int, title: String, `abstract`: String)
> // CELL 2:
> //
> // Create a Dataset using the case class ...
> //
> import keywordissue.PatApp
> val applications = List(PatApp(1001, "1001", "Abstract 1001"), PatApp(1002, 
> "1002", "Abstract 1002"), PatApp(1003, "1003", "Abstract for 1003"), 
> PatApp(/* Duplicate! */ 1003, "1004", "Abstract 1004"))
> val appsDataset = sc.parallelize(applications).toDF.as[PatApp]
> // CELL 3:
> //
> // Force Dataset code-generation. This causes the error message to display ...
> //
> val duplicates = appsDataset.groupByKey(_.number).mapGroups((k, i) => (k, 
> i.length)).filter(_._2 > 0)
> duplicates.collect().foreach(println)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15737) Fix Jetty server start warning

2016-06-02 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-15737:

Component/s: (was: SQL)
 Spark Core

> Fix Jetty server start warning
> --
>
> Key: SPARK-15737
> URL: https://issues.apache.org/jira/browse/SPARK-15737
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Bo Meng
>Priority: Minor
>
> While running any test cases, you will always see something like 
> "14:23:10.834 WARN org.eclipse.jetty.server.handler.AbstractHandler: No 
> Server set for org.eclipse.jetty.server.handler.ErrorHandler@76884e4b". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-15737) Fix Jetty server start warning

2016-06-02 Thread Bo Meng (JIRA)

Bo Meng created SPARK-15737:
---

 Summary: Fix Jetty server start warning
 Key: SPARK-15737
 URL: https://issues.apache.org/jira/browse/SPARK-15737
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Bo Meng
Priority: Minor


While running any test cases, you will always see something like "14:23:10.834 
WARN org.eclipse.jetty.server.handler.AbstractHandler: No Server set for 
org.eclipse.jetty.server.handler.ErrorHandler@76884e4b". 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14752) LazilyGenerateOrdering throws NullPointerException

2016-06-02 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312779#comment-15312779
 ] 

Bo Meng commented on SPARK-14752:
-

 I think this is a good approach. Thanks.

> LazilyGenerateOrdering throws NullPointerException
> --
>
> Key: SPARK-14752
> URL: https://issues.apache.org/jira/browse/SPARK-14752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Rajesh Balamohan
>
> codebase: spark master
> DataSet: TPC-DS
> Client: $SPARK_HOME/bin/beeline
> Example query to reproduce the issue:  
> select i_item_id from item order by i_item_id limit 10;
> Explain plan output
> {noformat}
> explain select i_item_id from item order by i_item_id limit 10;
> +--+--+
> | 
> plan  
>   
>  |
> +--+--+
> | == Physical Plan ==
> TakeOrderedAndProject(limit=10, orderBy=[i_item_id#1229 ASC], 
> output=[i_item_id#1229])
> +- WholeStageCodegen
>:  +- Project [i_item_id#1229]
>: +- Scan HadoopFiles[i_item_id#1229] Format: ORC, PushedFilters: [], 
> ReadSchema: struct  |
> +--+--+
> {noformat}
> Exception:
> {noformat}
> TaskResultGetter: Exception while getting task result
> com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
> Serialization trace:
> underlying (org.apache.spark.util.BoundedPriorityQueue)
>   at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
>   at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:25)
>   at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:19)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
>   at 
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:312)
>   at 
> org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:87)
>   at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:66)
>   at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
>   at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1791)
>   at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:56)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering.compare(GenerateOrdering.scala:157)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering.compare(GenerateOrdering.scala:148)
>   at scala.math.Ordering$$anon$4.compare(Ordering.scala:111)
>   at java.util.PriorityQueue.siftUpUsingComparator(PriorityQueue.java:669)
>   at java.util.PriorityQueue.siftUp(PriorityQueue.java:645)
>   at java.util.PriorityQueue.offer(PriorityQueue.java:344)
>   at java.util.PriorityQueue.add(PriorityQueue.java:321)
>   at 
> com.twitter.chill.java.PriorityQueueSerializer.read(PriorityQueueSerializer.java:78)
>   at 
> com.twitter.chill.java.PriorityQueueSerializer.read(PriorityQueueSerializer.java:31)
>   at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>   at 
>

[jira] [Created] (SPARK-15537) clean up the temp folders after finishing the tests

2016-05-25 Thread Bo Meng (JIRA)

Bo Meng created SPARK-15537:
---

 Summary: clean up the temp folders after finishing the tests
 Key: SPARK-15537
 URL: https://issues.apache.org/jira/browse/SPARK-15537
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Bo Meng


For some of the test cases, e.g. OrcSourceSuite, it will create temp folders 
and temp files inside them. But after tests finish, the folders are not 
removed. This will cause lots of temp files created and space occupied, if we 
keep running the test cases.

The reason is dir.delete() won't work if dir is not empty. We need to 
recursively delete the content before deleting the folder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-15468) fix some typos while browsing the codes

2016-05-21 Thread Bo Meng (JIRA)

Bo Meng created SPARK-15468:
---

 Summary: fix some typos while browsing the codes
 Key: SPARK-15468
 URL: https://issues.apache.org/jira/browse/SPARK-15468
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Bo Meng
Priority: Minor


Found some typos while browsing the codes briefly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-15230) Back quoted column with dot in it fails when running distinct on dataframe

2016-05-16 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285302#comment-15285302
 ] 

Bo Meng edited comment on SPARK-15230 at 5/16/16 9:11 PM:
--

In the description, {{it does not work for describe()}} should be {{it does not 
work for distinct()}}, please update the description, thanks.


was (Author: bomeng):
In the description, `it does not work for describe()` should be `it does not 
work for distinct()`, please update the description, thanks.

> Back quoted column with dot in it fails when running distinct on dataframe
> --
>
> Key: SPARK-15230
> URL: https://issues.apache.org/jira/browse/SPARK-15230
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 1.6.0
>Reporter: Barry Becker
>
> When working with a dataframe columns with .'s in them must be backquoted 
> (``) or the column name will not be found. This works for most dataframe 
> methods, but I discovered that it does not work for describe().
> Suppose you have a dataFrame, testDf, with a DoubleType column named 
> {{pos.NoZero}}.  This statememt:
> {noformat}
> testDf.select(new Column("`pos.NoZero`")).distinct().collect().mkString(", ")
> {noformat}
> will fail with this error:
> {noformat}
> org.apache.spark.sql.AnalysisException: Cannot resolve column name 
> "pos.NoZero" among (pos.NoZero);
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:152)
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:152)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:151)
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$dropDuplicates$1$$anonfun$40.apply(DataFrame.scala:1329)
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$dropDuplicates$1$$anonfun$40.apply(DataFrame.scala:1329)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$dropDuplicates$1.apply(DataFrame.scala:1329)
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$dropDuplicates$1.apply(DataFrame.scala:1328)
>   at 
> org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$withPlan(DataFrame.scala:2165)
>   at org.apache.spark.sql.DataFrame.dropDuplicates(DataFrame.scala:1328)
>   at org.apache.spark.sql.DataFrame.dropDuplicates(DataFrame.scala:1348)
>   at org.apache.spark.sql.DataFrame.dropDuplicates(DataFrame.scala:1319)
>   at org.apache.spark.sql.DataFrame.distinct(DataFrame.scala:1612)
>   at 
> com.mineset.spark.vizagg.selection.SelectionExpressionSuite$$anonfun$40.apply$mcV$sp(SelectionExpressionSuite.scala:317)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15230) Back quoted column with dot in it fails when running distinct on dataframe

2016-05-16 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285302#comment-15285302
 ] 

Bo Meng commented on SPARK-15230:
-

In the description, `it does not work for describe()` should be `it does not 
work for distinct()`, please update the description, thanks.

> Back quoted column with dot in it fails when running distinct on dataframe
> --
>
> Key: SPARK-15230
> URL: https://issues.apache.org/jira/browse/SPARK-15230
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 1.6.0
>Reporter: Barry Becker
>
> When working with a dataframe columns with .'s in them must be backquoted 
> (``) or the column name will not be found. This works for most dataframe 
> methods, but I discovered that it does not work for describe().
> Suppose you have a dataFrame, testDf, with a DoubleType column named 
> {{pos.NoZero}}.  This statememt:
> {noformat}
> testDf.select(new Column("`pos.NoZero`")).distinct().collect().mkString(", ")
> {noformat}
> will fail with this error:
> {noformat}
> org.apache.spark.sql.AnalysisException: Cannot resolve column name 
> "pos.NoZero" among (pos.NoZero);
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:152)
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:152)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:151)
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$dropDuplicates$1$$anonfun$40.apply(DataFrame.scala:1329)
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$dropDuplicates$1$$anonfun$40.apply(DataFrame.scala:1329)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$dropDuplicates$1.apply(DataFrame.scala:1329)
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$dropDuplicates$1.apply(DataFrame.scala:1328)
>   at 
> org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$withPlan(DataFrame.scala:2165)
>   at org.apache.spark.sql.DataFrame.dropDuplicates(DataFrame.scala:1328)
>   at org.apache.spark.sql.DataFrame.dropDuplicates(DataFrame.scala:1348)
>   at org.apache.spark.sql.DataFrame.dropDuplicates(DataFrame.scala:1319)
>   at org.apache.spark.sql.DataFrame.distinct(DataFrame.scala:1612)
>   at 
> com.mineset.spark.vizagg.selection.SelectionExpressionSuite$$anonfun$40.apply$mcV$sp(SelectionExpressionSuite.scala:317)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15062) Show on DataFrame causes OutOfMemoryError, NegativeArraySizeException or segfault

2016-05-02 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267477#comment-15267477
 ] 

Bo Meng commented on SPARK-15062:
-

I will make a PR shortly.

> Show on DataFrame causes OutOfMemoryError, NegativeArraySizeException or 
> segfault 
> --
>
> Key: SPARK-15062
> URL: https://issues.apache.org/jira/browse/SPARK-15062
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
> Environment: spark-2.0.0-SNAPSHOT using commit hash 
> 90787de864b58a1079c23e6581381ca8ffe7685f and Java 1.7.0_67
>Reporter: koert kuipers
>Priority: Blocker
>
> {noformat}
> scala> val dfComplicated = sc.parallelize(List((Map("1" -> "a"), List("b", 
> "c")), (Map("2" -> "b"), List("d", "e".toDF
> ...
> dfComplicated: org.apache.spark.sql.DataFrame = [_1: map, _2: 
> array]
> scala> dfComplicated.show
> java.lang.OutOfMemoryError: Java heap space
>   at org.apache.spark.unsafe.types.UTF8String.getBytes(UTF8String.java:229)
>   at org.apache.spark.unsafe.types.UTF8String.toString(UTF8String.java:821)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:241)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1$$anonfun$apply$13.apply(Dataset.scala:2121)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1$$anonfun$apply$13.apply(Dataset.scala:2121)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2121)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:54)
>   at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2408)
>   at 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2120)
>   at 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2127)
>   at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1861)
>   at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1860)
>   at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2438)
>   at org.apache.spark.sql.Dataset.head(Dataset.scala:1860)
>   at org.apache.spark.sql.Dataset.take(Dataset.scala:2077)
>   at org.apache.spark.sql.Dataset.showString(Dataset.scala:238)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:529)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:489)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:498)
>   ... 6 elided
> scala>
> {noformat}
> By increasing memory to 8G one will instead get a NegativeArraySizeException 
> or a segfault.
> See here for original discussion:
> http://apache-spark-developers-list.1001551.n3.nabble.com/spark-2-segfault-td17381.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14897) Upgrade Jetty to latest version of 8/9

2016-05-02 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267256#comment-15267256
 ] 

Bo Meng commented on SPARK-14897:
-

I will do it once I've got a chance. Thanks.

> Upgrade Jetty to latest version of 8/9
> --
>
> Key: SPARK-14897
> URL: https://issues.apache.org/jira/browse/SPARK-14897
> Project: Spark
>  Issue Type: Improvement
>Reporter: Adam Kramer
>  Labels: web-ui
>
> It looks like the head/master branch of Spark uses quite an old version of 
> Jetty: 8.1.14.v20131031
> There have been some announcement of security vulnerabilities, notably in 
> 2015 and there are versions of both 8 and 9 that address those. We recently 
> left a web-ui port open and had the server compromised within days. Albeit, 
> this upgrade shouldn't be the only security improvement made, the current 
> version is clearly vulnerable, as-is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14955) JDBCRelation should report an IllegalArgumentException if stride equals 0

2016-04-27 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261457#comment-15261457
 ] 

Bo Meng commented on SPARK-14955:
-

Only after committer gets PR merged, then this JIRA will be automatically 
closed. Thanks. 

> JDBCRelation should report an IllegalArgumentException if stride equals 0
> -
>
> Key: SPARK-14955
> URL: https://issues.apache.org/jira/browse/SPARK-14955
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.1, 1.6.1
>Reporter: Yang Juan hu
>Priority: Minor
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> In file 
> https://github.com/apache/spark/blob/40ed2af587cedadc6e5249031857a922b3b234ca/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala
> row 56 and 57 has following line
> val stride: Long = (partitioning.upperBound / numPartitions
>   - partitioning.lowerBound / numPartitions)
> if we invoke columnPartition as below: 
> columnPartition( JDBCPartitioningInfo("partitionColumn", 0, 7, 8) );
> columnPartition will generate following where condition:
> whereClause: partitionColumn < 0
> whereClause: partitionColumn >= 0 AND partitionColumn < 0
> whereClause: partitionColumn >= 0 AND partitionColumn < 0
> whereClause: partitionColumn >= 0 AND partitionColumn < 0
> whereClause: partitionColumn >= 0 AND partitionColumn < 0
> whereClause: partitionColumn >= 0 AND partitionColumn < 0
> whereClause: partitionColumn >= 0 AND partitionColumn < 0
> whereClause: partitionColumn >= 0
> it will cause data skew, the last partition will contain all data.
> Propose to throw an exception if stride equal 0, help spark user to aware  
> data skew issue ASAP.
> if (stride == 0) return throw new 
> IllegalArgumentException("partitioning.upperBound / numPartitions -  
> partitioning.lowerBound / numPartitions is zero");
> partitionColumn must be an integral type, if we want to load a big table from 
> DBMS,  we need to do some work around.
> Real case to export data from ORACLE database through pyspark.
> #data skew issue version
> df=ssc.read.format("jdbc").options( url=url,
> dbtable="( SELECT ORA_HASH(PART_COL,7)  AS PART_ID, A.* FROM DBMS_TAB A ) 
> TAB_ALIAS",
> fetchSize="1000",
> partitionColumn="PART_ID",
> numPartitions="8",
> lowerBound="0",
> upperBound="7").load()
> #no data skew issue version
> df=ssc.read.format("jdbc").options( url=url,
> dbtable="( SELECT ORA_HASH(PART_COL,7)+1  AS PART_ID, A.* FROM DBMS_TAB A 
> ) TAB_ALIAS",
> fetchSize="1000",
> partitionColumn="PART_ID",
> numPartitions="8",
> lowerBound="1",
> upperBound="8").load()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14959) Problem Reading partitioned ORC or Parquet files

2016-04-27 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261126#comment-15261126
 ] 

Bo Meng commented on SPARK-14959:
-

I have tried on master branch, it works fine with the latest code. 

> Problem Reading partitioned ORC or Parquet files
> -
>
> Key: SPARK-14959
> URL: https://issues.apache.org/jira/browse/SPARK-14959
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.0.0
> Environment: Hadoop 2.7.1.2.4.0.0-169 (HDP 2.4)
>Reporter: Sebastian YEPES FERNANDEZ
>Priority: Critical
>
> Hello,
> I have noticed that in the pasts days there is an issue when trying to read 
> partitioned files from HDFS.
> I am running on Spark master branch #c544356
> The write actually works but the read fails.
> {code:title=Issue Reproduction}
> case class Data(id: Int, text: String)
> val ds = spark.createDataset( Seq(Data(0, "hello"), Data(1, "hello"), Data(0, 
> "world"), Data(1, "there")) )
> scala> 
> ds.write.mode(org.apache.spark.sql.SaveMode.Overwrite).format("parquet").partitionBy("id").save("/user/spark/test.parquet")
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".  
>   
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> java.io.FileNotFoundException: Path is not a file: 
> /user/spark/test.parquet/id=0
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:652)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at 
> org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1242)
>   at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1227)
>   at org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:1285)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:221)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:217)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:228)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:209)
>   at 
> org.apache.spark.sql.execution.datasources.HDFSFileCatalog$$anonfun$9$$anonfun$apply$4.apply(fileSourceInterfaces.scala:372)
>   at 
> org.apache.spark.sql.execution.datasources.HDFSFileCatalog$$anonfun$9$$anonfun$apply$4.apply(fileSourceInterfaces.scala:360)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
>

[jira] [Commented] (SPARK-14965) StructType throws exception for missing field

2016-04-27 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261096#comment-15261096
 ] 

Bo Meng commented on SPARK-14965:
-

I believe returning null does not make sense here, so exception is preferred.

> StructType throws exception for missing field
> -
>
> Key: SPARK-14965
> URL: https://issues.apache.org/jira/browse/SPARK-14965
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0, 1.6.1
>Reporter: Gregory Hart
>Priority: Minor
>
> The ScalaDoc for StructType.apply(String) indicates the method should return 
> null if it does not contain a field with the given name. The method 
> implementation throws an exception in this case instead.
> I suggest that either the implementation should be corrected to return null 
> if the field is not found, or the ScalaDoc be corrected to indicate an 
> exception is thrown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14955) JDBCRelation should report an IllegalArgumentException if stride equals 0

2016-04-27 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260902#comment-15260902
 ] 

Bo Meng commented on SPARK-14955:
-

Please note that it also affect the master branch. 

> JDBCRelation should report an IllegalArgumentException if stride equals 0
> -
>
> Key: SPARK-14955
> URL: https://issues.apache.org/jira/browse/SPARK-14955
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.1, 1.6.1
>Reporter: Yang Juan hu
>Priority: Minor
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> In file 
> https://github.com/apache/spark/blob/40ed2af587cedadc6e5249031857a922b3b234ca/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala
> row 56 and 57 has following line
> val stride: Long = (partitioning.upperBound / numPartitions
>   - partitioning.lowerBound / numPartitions)
> if we invoke columnPartition as below: 
> columnPartition( JDBCPartitioningInfo("partitionColumn", 0, 7, 8) );
> columnPartition will generate following where condition:
> whereClause: partitionColumn < 0
> whereClause: partitionColumn >= 0 AND partitionColumn < 0
> whereClause: partitionColumn >= 0 AND partitionColumn < 0
> whereClause: partitionColumn >= 0 AND partitionColumn < 0
> whereClause: partitionColumn >= 0 AND partitionColumn < 0
> whereClause: partitionColumn >= 0 AND partitionColumn < 0
> whereClause: partitionColumn >= 0 AND partitionColumn < 0
> whereClause: partitionColumn >= 0
> it will cause data skew, the last partition will contain all data.
> Propose to throw an exception if stride equal 0, help spark user to aware  
> data skew issue ASAP.
> if (stride == 0) return throw new 
> IllegalArgumentException("partitioning.upperBound / numPartitions -  
> partitioning.lowerBound / numPartitions is zero");
> partitionColumn must be an integral type, if we want to load a big table from 
> DBMS,  we need to do some work around.
> Real case to export data from ORACLE database through pyspark.
> #data skew issue version
> df=ssc.read.format("jdbc").options( url=url,
> dbtable="( SELECT ORA_HASH(PART_COL,7)  AS PART_ID, A.* FROM DBMS_TAB A ) 
> TAB_ALIAS",
> fetchSize="1000",
> partitionColumn="PART_ID",
> numPartitions="8",
> lowerBound="0",
> upperBound="7").load()
> #no data skew issue version
> df=ssc.read.format("jdbc").options( url=url,
> dbtable="( SELECT ORA_HASH(PART_COL,7)+1  AS PART_ID, A.* FROM DBMS_TAB A 
> ) TAB_ALIAS",
> fetchSize="1000",
> partitionColumn="PART_ID",
> numPartitions="8",
> lowerBound="1",
> upperBound="8").load()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14897) Upgrade Jetty to latest version of 8/9

2016-04-27 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260740#comment-15260740
 ] 

Bo Meng commented on SPARK-14897:
-

Although it is easy to make it to the latest 8 version, Jetty 8 is EOL.
See: [http://download.eclipse.org/jetty/]

Jetty 9 has some significant change, which makes thread pool needs to be passed 
in the constructor. But Spark current implementation will calculate the pool 
size based on connectors. So not sure what have to do make it compatible to the 
current implementation. Any suggestions?

> Upgrade Jetty to latest version of 8/9
> --
>
> Key: SPARK-14897
> URL: https://issues.apache.org/jira/browse/SPARK-14897
> Project: Spark
>  Issue Type: Improvement
>Reporter: Adam Kramer
>  Labels: web-ui
>
> It looks like the head/master branch of Spark uses quite an old version of 
> Jetty: 8.1.14.v20131031
> There have been some announcement of security vulnerabilities, notably in 
> 2015 and there are versions of both 8 and 9 that address those. We recently 
> left a web-ui port open and had the server compromised within days. Albeit, 
> this upgrade shouldn't be the only security improvement made, the current 
> version is clearly vulnerable, as-is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14955) JDBCRelation should report an IllegalArgumentException if stride equals 0

2016-04-27 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260352#comment-15260352
 ] 

Bo Meng commented on SPARK-14955:
-

I will take a look to see what can be improved.

> JDBCRelation should report an IllegalArgumentException if stride equals 0
> -
>
> Key: SPARK-14955
> URL: https://issues.apache.org/jira/browse/SPARK-14955
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.1, 1.6.1
>Reporter: Yang Juan hu
>Priority: Minor
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> In file 
> https://github.com/apache/spark/blob/40ed2af587cedadc6e5249031857a922b3b234ca/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala
> row 56 and 57 has following line
> val stride: Long = (partitioning.upperBound / numPartitions
>   - partitioning.lowerBound / numPartitions)
> if we invoke columnPartition as below: 
> columnPartition( JDBCPartitioningInfo("partitionColumn", 0, 7, 8) );
> columnPartition will generate following where condition:
> whereClause: partitionColumn < 0
> whereClause: partitionColumn >= 0 AND partitionColumn < 0
> whereClause: partitionColumn >= 0 AND partitionColumn < 0
> whereClause: partitionColumn >= 0 AND partitionColumn < 0
> whereClause: partitionColumn >= 0 AND partitionColumn < 0
> whereClause: partitionColumn >= 0 AND partitionColumn < 0
> whereClause: partitionColumn >= 0 AND partitionColumn < 0
> whereClause: partitionColumn >= 0
> it will cause data skew, the last partition will contain all data.
> Propose to throw an exception if stride equal 0, help spark user to aware  
> data skew issue ASAP.
> if (stride == 0) return throw new 
> IllegalArgumentException("partitioning.upperBound / numPartitions -  
> partitioning.lowerBound / numPartitions is zero");
> partitionColumn must be an integral type, if we want to load a big table from 
> DBMS,  we need to do some work around.
> Real case to export data from ORACLE database through pyspark.
> #data skew issue version
> df=ssc.read.format("jdbc").options( url=url,
> dbtable="( SELECT ORA_HASH(PART_COL,7)  AS PART_ID, A.* FROM DBMS_TAB A ) 
> TAB_ALIAS",
> fetchSize="1000",
> partitionColumn="PART_ID",
> numPartitions="8",
> lowerBound="0",
> upperBound="7").load()
> #no data skew issue version
> df=ssc.read.format("jdbc").options( url=url,
> dbtable="( SELECT ORA_HASH(PART_COL,7)+1  AS PART_ID, A.* FROM DBMS_TAB A 
> ) TAB_ALIAS",
> fetchSize="1000",
> partitionColumn="PART_ID",
> numPartitions="8",
> lowerBound="1",
> upperBound="8").load()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-14928) Support variable substitution in SET command

2016-04-26 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng closed SPARK-14928.
---
Resolution: Won't Fix

> Support variable substitution in SET command
> 
>
> Key: SPARK-14928
> URL: https://issues.apache.org/jira/browse/SPARK-14928
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Bo Meng
>
> In the {{SET key=value}} command, value can be defined as a variable and 
> replaced by substitution.
> Since we have {{VARIABLE_SUBSTITUTE_ENABLED}} and 
> {{VARIABLE_SUBSTITUTE_DEPTH}} defined in the {{SQLConf}}, it is nice to use 
> them in the SET command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14928) Support variable substitution in SET command

2016-04-26 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259069#comment-15259069
 ] 

Bo Meng commented on SPARK-14928:
-

Parser already handles it. Closing it.

> Support variable substitution in SET command
> 
>
> Key: SPARK-14928
> URL: https://issues.apache.org/jira/browse/SPARK-14928
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Bo Meng
>
> In the {{SET key=value}} command, value can be defined as a variable and 
> replaced by substitution.
> Since we have {{VARIABLE_SUBSTITUTE_ENABLED}} and 
> {{VARIABLE_SUBSTITUTE_DEPTH}} defined in the {{SQLConf}}, it is nice to use 
> them in the SET command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14928) Support variable substitution in SET command

2016-04-26 Thread Bo Meng (JIRA)

Bo Meng created SPARK-14928:
---

 Summary: Support variable substitution in SET command
 Key: SPARK-14928
 URL: https://issues.apache.org/jira/browse/SPARK-14928
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Bo Meng


In the {{SET key=value}} command, value can be defined as a variable and 
replaced by substitution.

Since we have {{VARIABLE_SUBSTITUTE_ENABLED}} and {{VARIABLE_SUBSTITUTE_DEPTH}} 
defined in the {{SQLConf}}, it is nice to use them in the SET command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14923) Support "Extended" in "Describe" table DDL

2016-04-26 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14923:

Description: Currently, {{Extended}} keywords in {{Describe [Extended] 
}} DDL is simply ignored. This JIRA is to bring it back with the 
similar behavior as Hive does.  (was: Currently, {{Extended}} keywords in 
{{Describe [Extended]  Support "Extended" in "Describe" table DDL
> --
>
> Key: SPARK-14923
> URL: https://issues.apache.org/jira/browse/SPARK-14923
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Bo Meng
>
> Currently, {{Extended}} keywords in {{Describe [Extended] }} DDL 
> is simply ignored. This JIRA is to bring it back with the similar behavior as 
> Hive does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14923) Support "Extended" in "Describe" table DDL

2016-04-26 Thread Bo Meng (JIRA)

Bo Meng created SPARK-14923:
---

 Summary: Support "Extended" in "Describe" table DDL
 Key: SPARK-14923
 URL: https://issues.apache.org/jira/browse/SPARK-14923
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Bo Meng


Currently, {{Extended}} keywords in {{Describe [Extended]

[jira] [Commented] (SPARK-14840) Cannot drop a table which has the name starting with 'or'

2016-04-21 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253400#comment-15253400
 ] 

Bo Meng commented on SPARK-14840:
-

I am testing against master:
1. I do not think your test is valid, at least it should be:
sqlContext.sql("drop table tmp.order")
2. It works fine if you just add {{`}} to {{order}}, without it, it will throw 
exception.
sqlContext.sql("drop table `order`");
I have ignored {{tmp}} here.

> Cannot drop a table which has the name starting with 'or'
> -
>
> Key: SPARK-14840
> URL: https://issues.apache.org/jira/browse/SPARK-14840
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.2
>Reporter: Kwangwoo Kim
>
> sqlContext("drop table tmp.order")  
> The above code makes error as following: 
> 6/04/22 14:27:17 INFO ParseDriver: Parsing command: drop table tmp.order
> 16/04/22 14:27:19 INFO ParseDriver: Parse Completed
> 16/04/22 14:27:19 WARN DropTable: [1.5] failure: identifier expected
> tmp.order
> ^
> java.lang.RuntimeException: [1.5] failure: identifier expected
> tmp.order
> ^
>   at scala.sys.package$.error(package.scala:27)
>   at 
> org.apache.spark.sql.catalyst.SqlParser$.parseTableIdentifier(SqlParser.scala:58)
>   at org.apache.spark.sql.SQLContext.table(SQLContext.scala:827)
>   at org.apache.spark.sql.hive.execution.DropTable.run(commands.scala:62)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:145)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
>   at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
>   at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
>   at 
> $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:26)
>   at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:31)
>   at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33)
>   at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC.(:35)
>   at $line15.$read$$iwC$$iwC$$iwC$$iwC.(:37)
>   at $line15.$read$$iwC$$iwC$$iwC.(:39)
>   at $line15.$read$$iwC$$iwC.(:41)
>   at $line15.$read$$iwC.(:43)
>   at $line15.$read.(:45)
>   at $line15.$read$.(:49)
>   at $line15.$read$.()
>   at $line15.$eval$.(:7)
>   at $line15.$eval$.()
>   at $line15.$eval.$print()
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>   at 
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
>   at 
> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>   at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>   at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>   at 
> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>   at 
> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>   at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>   at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>   at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>   at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
>   at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>   at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>   at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>   at 
>

[jira] [Commented] (SPARK-14840) Cannot drop a table which has the name starting with 'or'

2016-04-21 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253376#comment-15253376
 ] 

Bo Meng commented on SPARK-14840:
-

I think because {{order}} is a keyword, please try not to use it.

> Cannot drop a table which has the name starting with 'or'
> -
>
> Key: SPARK-14840
> URL: https://issues.apache.org/jira/browse/SPARK-14840
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.2
>Reporter: Kwangwoo Kim
>
> sqlContext("drop table tmp.order")  
> The above code makes error as following: 
> 6/04/22 14:27:17 INFO ParseDriver: Parsing command: drop table tmp.order
> 16/04/22 14:27:19 INFO ParseDriver: Parse Completed
> 16/04/22 14:27:19 WARN DropTable: [1.5] failure: identifier expected
> tmp.order
> ^
> java.lang.RuntimeException: [1.5] failure: identifier expected
> tmp.order
> ^
>   at scala.sys.package$.error(package.scala:27)
>   at 
> org.apache.spark.sql.catalyst.SqlParser$.parseTableIdentifier(SqlParser.scala:58)
>   at org.apache.spark.sql.SQLContext.table(SQLContext.scala:827)
>   at org.apache.spark.sql.hive.execution.DropTable.run(commands.scala:62)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:145)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
>   at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
>   at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
>   at 
> $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:26)
>   at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:31)
>   at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33)
>   at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC.(:35)
>   at $line15.$read$$iwC$$iwC$$iwC$$iwC.(:37)
>   at $line15.$read$$iwC$$iwC$$iwC.(:39)
>   at $line15.$read$$iwC$$iwC.(:41)
>   at $line15.$read$$iwC.(:43)
>   at $line15.$read.(:45)
>   at $line15.$read$.(:49)
>   at $line15.$read$.()
>   at $line15.$eval$.(:7)
>   at $line15.$eval$.()
>   at $line15.$eval.$print()
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>   at 
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
>   at 
> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>   at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>   at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>   at 
> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>   at 
> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>   at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>   at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>   at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>   at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
>   at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>   at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>   at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>   at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>   at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
>   at

[jira] [Comment Edited] (SPARK-14840) Cannot drop a table which has the name starting with 'or'

2016-04-21 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253376#comment-15253376
 ] 

Bo Meng edited comment on SPARK-14840 at 4/22/16 5:34 AM:
--

I think because {{order}} is a keyword, please try not to use it as table name.


was (Author: bomeng):
I think because {{order}} is a keyword, please try not to use it.

> Cannot drop a table which has the name starting with 'or'
> -
>
> Key: SPARK-14840
> URL: https://issues.apache.org/jira/browse/SPARK-14840
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.2
>Reporter: Kwangwoo Kim
>
> sqlContext("drop table tmp.order")  
> The above code makes error as following: 
> 6/04/22 14:27:17 INFO ParseDriver: Parsing command: drop table tmp.order
> 16/04/22 14:27:19 INFO ParseDriver: Parse Completed
> 16/04/22 14:27:19 WARN DropTable: [1.5] failure: identifier expected
> tmp.order
> ^
> java.lang.RuntimeException: [1.5] failure: identifier expected
> tmp.order
> ^
>   at scala.sys.package$.error(package.scala:27)
>   at 
> org.apache.spark.sql.catalyst.SqlParser$.parseTableIdentifier(SqlParser.scala:58)
>   at org.apache.spark.sql.SQLContext.table(SQLContext.scala:827)
>   at org.apache.spark.sql.hive.execution.DropTable.run(commands.scala:62)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:145)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
>   at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
>   at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
>   at 
> $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:26)
>   at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:31)
>   at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33)
>   at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC.(:35)
>   at $line15.$read$$iwC$$iwC$$iwC$$iwC.(:37)
>   at $line15.$read$$iwC$$iwC$$iwC.(:39)
>   at $line15.$read$$iwC$$iwC.(:41)
>   at $line15.$read$$iwC.(:43)
>   at $line15.$read.(:45)
>   at $line15.$read$.(:49)
>   at $line15.$read$.()
>   at $line15.$eval$.(:7)
>   at $line15.$eval$.()
>   at $line15.$eval.$print()
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>   at 
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
>   at 
> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>   at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>   at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>   at 
> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>   at 
> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>   at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>   at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>   at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>   at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
>   at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>   at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>   at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>   at 
>

[jira] [Issue Comment Deleted] (SPARK-14541) SQL function: IFNULL, NULLIF, NVL and NVL2

2016-04-21 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14541:

Comment: was deleted

(was: I will try to do it one by one. )

> SQL function: IFNULL, NULLIF, NVL and NVL2
> --
>
> Key: SPARK-14541
> URL: https://issues.apache.org/jira/browse/SPARK-14541
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>
> It will be great to have these SQL functions:
> IFNULL, NULLIF, NVL, NVL2
> The meaning of these functions could be found in oracle docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14819) Improve the "SET" and "SET -v" command

2016-04-21 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14819:

Description: 
Currently {{SET}} and {{SET -v}} commands are similar to Hive {{SET}} command 
except the following difference:
1. The result is not sorted;
2. When using {{SET}} and {{SET -v}}, in addition to the Hive related 
properties, it will also list all the system properties and environment 
properties, which is very useful in some cases.

This JIRA is trying to make the current {{SET}} command more consistent to Hive 
output. 

  was:
Currently `SET` and `SET -v` commands are similar to Hive `SET` command except 
the following difference:
1. The result is not sorted;
2. When using `SET` and `SET -v`, in addition to the Hive related properties, 
it will also list all the system properties and environment properties, which 
is very useful in some cases.

This JIRA is trying to make the current `SET` command more consistent to Hive 
output. 


> Improve the "SET" and "SET -v" command
> --
>
> Key: SPARK-14819
> URL: https://issues.apache.org/jira/browse/SPARK-14819
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Bo Meng
>
> Currently {{SET}} and {{SET -v}} commands are similar to Hive {{SET}} command 
> except the following difference:
> 1. The result is not sorted;
> 2. When using {{SET}} and {{SET -v}}, in addition to the Hive related 
> properties, it will also list all the system properties and environment 
> properties, which is very useful in some cases.
> This JIRA is trying to make the current {{SET}} command more consistent to 
> Hive output. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14819) Improve the "SET" and "SET -v" command

2016-04-21 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14819:

Description: 
Currently `SET` and `SET -v` commands are similar to Hive `SET` command except 
the following difference:
1. The result is not sorted;
2. When using `SET` and `SET -v`, in addition to the Hive related properties, 
it will also list all the system properties and environment properties, which 
is very useful in some cases.

This JIRA is trying to make the current `SET` command more consistent to Hive 
output. 

  was:
Currently `SET` and `SET -v` commands are similar to Hive `SET` command except 
the following difference:
1. The result is not sorted;
2. When using `SET` and `SET -v`, in addition to the Hive related properties, 
it will also list all the system properties and environment properties, which 
is very useful in some case.

This JIRA is trying to make the current `SET` command more consistent to Hive 
output. 


> Improve the "SET" and "SET -v" command
> --
>
> Key: SPARK-14819
> URL: https://issues.apache.org/jira/browse/SPARK-14819
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Bo Meng
>
> Currently `SET` and `SET -v` commands are similar to Hive `SET` command 
> except the following difference:
> 1. The result is not sorted;
> 2. When using `SET` and `SET -v`, in addition to the Hive related properties, 
> it will also list all the system properties and environment properties, which 
> is very useful in some cases.
> This JIRA is trying to make the current `SET` command more consistent to Hive 
> output. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14819) Improve the "SET" and "SET -v" command

2016-04-21 Thread Bo Meng (JIRA)

Bo Meng created SPARK-14819:
---

 Summary: Improve the "SET" and "SET -v" command
 Key: SPARK-14819
 URL: https://issues.apache.org/jira/browse/SPARK-14819
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Bo Meng


Currently `SET` and `SET -v` commands are similar to Hive `SET` command except 
the following difference:
1. The result is not sorted;
2. When using `SET` and `SET -v`, in addition to the Hive related properties, 
it will also list all the system properties and environment properties, which 
is very useful in some case.

This JIRA is trying to make the current `SET` command more consistent to Hive 
output. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14414) Make error messages consistent across DDLs

2016-04-19 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248265#comment-15248265
 ] 

Bo Meng commented on SPARK-14414:
-

Can anyone update the 'Assignee' for this one, since my code was already merged 
in? If there are still something left I can work on, please advice, thanks!

> Make error messages consistent across DDLs
> --
>
> Key: SPARK-14414
> URL: https://issues.apache.org/jira/browse/SPARK-14414
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> There are many different error messages right now when the user tries to run 
> something that's not supported. We might throw AnalysisException or 
> ParseException or NoSuchFunctionException etc. We should make all of these 
> consistent before 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14460) DataFrameWriter JDBC doesn't Quote/Escape column names

2016-04-14 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242330#comment-15242330
 ] 

Bo Meng commented on SPARK-14460:
-

I have added the test case that is using "order" as column name. Please check 
it out. Thanks.

> DataFrameWriter JDBC doesn't Quote/Escape column names
> --
>
> Key: SPARK-14460
> URL: https://issues.apache.org/jira/browse/SPARK-14460
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Sean Rose
>  Labels: easyfix
>
> When I try to write a DataFrame which contains a column with a space in it 
> ("Patient Address"), I get an error: java.sql.BatchUpdateException: Incorrect 
> syntax near 'Address'
> I believe the issue is that JdbcUtils.insertStatement isn't quoting/escaping 
> column names. JdbcDialect has the "quoteIdentifier" method, which could be 
> called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14614) Add `bround` function

2016-04-13 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240259#comment-15240259
 ] 

Bo Meng commented on SPARK-14614:
-

I have tried on Hive 1.2.1, actually this function seems dropped out. 

> Add `bround` function
> -
>
> Key: SPARK-14614
> URL: https://issues.apache.org/jira/browse/SPARK-14614
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Dongjoon Hyun
>
> This issue aims to add `bound` function (aka Banker's round) by extending 
> current `round` implementation.
> Hive supports `bround` since 1.3.0. [Language 
> Manual|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF].
> {code}
> hive> select round(2.5), bround(2.5);
> OK
> 3.0   2.0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14541) SQL function: IFNULL, NULLIF, NVL and NVL2

2016-04-13 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240157#comment-15240157
 ] 

Bo Meng commented on SPARK-14541:
-

I will try to do it one by one. 

> SQL function: IFNULL, NULLIF, NVL and NVL2
> --
>
> Key: SPARK-14541
> URL: https://issues.apache.org/jira/browse/SPARK-14541
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>
> It will be great to have these SQL functions:
> IFNULL, NULLIF, NVL, NVL2
> The meaning of these functions could be found in oracle docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14441) Consolidate DDL tests

2016-04-11 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236359#comment-15236359
 ] 

Bo Meng commented on SPARK-14441:
-

I think DDLSuite and DDLCommandSuite can be combined into one, also can 
HiveDDLSuite and HiveDDLCommandSuite, since they are just testing the different 
stage.
If you agree, I will make the changes.


> Consolidate DDL tests
> -
>
> Key: SPARK-14441
> URL: https://issues.apache.org/jira/browse/SPARK-14441
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>
> Today we have DDLSuite, DDLCommandSuite, HiveDDLCommandSuite. It's confusing 
> whether a test should exist in one or the other. It also makes it less clear 
> whether our test coverage is comprehensive. Ideally we should consolidate 
> these files as much as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14532) Spark SQL IF/ELSE does not handle Double correctly

2016-04-11 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15235795#comment-15235795
 ] 

Bo Meng commented on SPARK-14532:
-

Just verified, it works fine with master.

> Spark SQL IF/ELSE does not handle Double correctly
> --
>
> Key: SPARK-14532
> URL: https://issues.apache.org/jira/browse/SPARK-14532
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
>Reporter: Al M
>
> I am using Spark SQL to add new columns to my data.  Below is an example 
> snipped in Scala:
> {code}myDF.withColumn("newcol", new 
> Column(SqlParser.parseExpression(sparkSqlExpr))).show{code}
> *What Works*
> If sparkSqlExpr = "IF(1=1, 1, 0)" then i see 1 in the result as expected.
> If sparkSqlExpr = "IF(1=1, 1.0, 1.5)" then i see 1.0 in the result as 
> expected.
> If sparkSqlExpr = "IF(1=1, 'A', 'B')" then i see 'A' in the result as 
> expected.
> *What does not Work*
> If sparkSqlExpr = "IF(1=1, 1.0, 0.0)" then I see error 
> org.apache.spark.sql.AnalysisException: cannot resolve 'if ((1 = 1)) 1.0 else 
> 0.0' due to data type mismatch: differing types in 'if ((1 = 1)) 1.0 else 
> 0.0' (decimal(2,1) and decimal(1,1)).;
> If sparkSqlExpr = "IF(1=1, 1.0, 10.0)" then I see error If sparkSqlExpr = 
> "IF(1=1, 1.0, 0.0)" then I see error   
> org.apache.spark.sql.AnalysisException: cannot resolve 'if ((1 = 1)) 1.0 else 
> 10.0' due to data type mismatch: differing types in 'if ((1 = 1)) 1.0 else 
> 10.0' (decimal(2,1) and decimal(3,1)).;
> If sparkSqlExpr = "IF(1=1, 1.1, 1.11)" then I see error 
> org.apache.spark.sql.AnalysisException: cannot resolve 'if ((1 = 1)) 1.1 else 
> 1.11' due to data type mismatch: differing types in 'if ((1 = 1)) 1.1 else 
> 1.11' (decimal(2,1) and decimal(3,2)).;
> It looks like the Spark SQL typing system is seeing doubles as different 
> types depending on the number of digits before and after the decimal point



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-14127) [Table related commands] Describe table

2016-04-09 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233748#comment-15233748
 ] 

Bo Meng edited comment on SPARK-14127 at 4/9/16 9:50 PM:
-

A little change to your creation of table will show comment in the "DESCRIBE":
{quote}create table ptestfilter (a string, b int) partitioned by (c string 
comment 'abc', d string);{quote}


was (Author: bomeng):
A little change of your creation of table will show comment in the "DESCRIBE":
{quote}create table ptestfilter (a string, b int) partitioned by (c string 
comment 'abc', d string);{quote}

> [Table related commands] Describe table
> ---
>
> Key: SPARK-14127
> URL: https://issues.apache.org/jira/browse/SPARK-14127
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>
> TOK_DESCTABLE
> Describe a column/table/partition (see here and here). Seems we support 
> DESCRIBE and DESCRIBE EXTENDED. It will be good to also support other 
> syntaxes (and check if we are missing anything).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-14127) [Table related commands] Describe table

2016-04-09 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233748#comment-15233748
 ] 

Bo Meng edited comment on SPARK-14127 at 4/9/16 9:49 PM:
-

A little change of your creation of table will show comment in the "DESCRIBE":
{quote}create table ptestfilter (a string, b int) partitioned by (c string 
comment 'abc', d string);{quote}


was (Author: bomeng):
A little change of your creation of table will show comment in the "DESCRIBE":
create table ptestfilter (a string, b int) partitioned by (c string 
comment 'abc', d string);

> [Table related commands] Describe table
> ---
>
> Key: SPARK-14127
> URL: https://issues.apache.org/jira/browse/SPARK-14127
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>
> TOK_DESCTABLE
> Describe a column/table/partition (see here and here). Seems we support 
> DESCRIBE and DESCRIBE EXTENDED. It will be good to also support other 
> syntaxes (and check if we are missing anything).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-14127) [Table related commands] Describe table

2016-04-09 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233748#comment-15233748
 ] 

Bo Meng edited comment on SPARK-14127 at 4/9/16 9:49 PM:
-

A little change of your creation of table will show comment in the "DESCRIBE":
create table ptestfilter (a string, b int) partitioned by (c string 
comment 'abc', d string);


was (Author: bomeng):
A little change of your creation of table will show comment in the "DESCRIBE":
`create table ptestfilter (a string, b int) partitioned by (c string comment 
'abc', d string);`

> [Table related commands] Describe table
> ---
>
> Key: SPARK-14127
> URL: https://issues.apache.org/jira/browse/SPARK-14127
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>
> TOK_DESCTABLE
> Describe a column/table/partition (see here and here). Seems we support 
> DESCRIBE and DESCRIBE EXTENDED. It will be good to also support other 
> syntaxes (and check if we are missing anything).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14127) [Table related commands] Describe table

2016-04-09 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233748#comment-15233748
 ] 

Bo Meng commented on SPARK-14127:
-

A little change of your creation of table will show comment in the "DESCRIBE":
`create table ptestfilter (a string, b int) partitioned by (c string comment 
'abc', d string);`

> [Table related commands] Describe table
> ---
>
> Key: SPARK-14127
> URL: https://issues.apache.org/jira/browse/SPARK-14127
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>
> TOK_DESCTABLE
> Describe a column/table/partition (see here and here). Seems we support 
> DESCRIBE and DESCRIBE EXTENDED. It will be good to also support other 
> syntaxes (and check if we are missing anything).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14496) some typos in the java doc while browsing the codes

2016-04-08 Thread Bo Meng (JIRA)

Bo Meng created SPARK-14496:
---

 Summary: some typos in the java doc while browsing the codes
 Key: SPARK-14496
 URL: https://issues.apache.org/jira/browse/SPARK-14496
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Bo Meng
Priority: Trivial


Really minor issues. I just found them while looking into the catalog codes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-14460) DataFrameWriter JDBC doesn't Quote/Escape column names

2016-04-07 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231077#comment-15231077
 ] 

Bo Meng edited comment on SPARK-14460 at 4/8/16 3:04 AM:
-

Thanks [~srose03] for finding the root cause - That makes the fix easier.I will 
post the fix shortly.


was (Author: bomeng):
I can take a look. Thanks.

> DataFrameWriter JDBC doesn't Quote/Escape column names
> --
>
> Key: SPARK-14460
> URL: https://issues.apache.org/jira/browse/SPARK-14460
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Sean Rose
>  Labels: easyfix
>
> When I try to write a DataFrame which contains a column with a space in it 
> ("Patient Address"), I get an error: java.sql.BatchUpdateException: Incorrect 
> syntax near 'Address'
> I believe the issue is that JdbcUtils.insertStatement isn't quoting/escaping 
> column names. JdbcDialect has the "quoteIdentifier" method, which could be 
> called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14460) DataFrameWriter JDBC doesn't Quote/Escape column names

2016-04-07 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231077#comment-15231077
 ] 

Bo Meng commented on SPARK-14460:
-

I can take a look. Thanks.

> DataFrameWriter JDBC doesn't Quote/Escape column names
> --
>
> Key: SPARK-14460
> URL: https://issues.apache.org/jira/browse/SPARK-14460
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Sean Rose
>  Labels: easyfix
>
> When I try to write a DataFrame which contains a column with a space in it 
> ("Patient Address"), I get an error: java.sql.BatchUpdateException: Incorrect 
> syntax near 'Address'
> I believe the issue is that JdbcUtils.insertStatement isn't quoting/escaping 
> column names. JdbcDialect has the "quoteIdentifier" method, which could be 
> called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14429) Improve LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL

2016-04-06 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14429:

Summary: Improve LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " 
DDL  (was: Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL)

> Improve LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
> 
>
> Key: SPARK-14429
> URL: https://issues.apache.org/jira/browse/SPARK-14429
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Bo Meng
>Priority: Minor
>
> LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
> etc DDL. In the pattern, user can use `|` or `\*` as wildcards.
> I'd like to address a few issues in this JIRA:
> # Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
> replacement was scattered in several places; it is good to have one place to 
> do the same thing;
> # Consistency with Hive: the pattern is case insensitive in Hive and white 
> spaces will be trimmed, but current pattern matching does not do that. For 
> example, suppose we have tables t1, t2, t3, {code:SQL}SHOW TABLES LIKE '  T*  
> '; {code} will list all the t-tables. 
> # Sort the result.
> Please use Hive to verify it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL

2016-04-06 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14429:

Description: 
LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
etc DDL. In the pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA:
# Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
replacement was scattered in several places; it is good to have one place to do 
the same thing;
# Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables t1, t2, t3, {code:SQL}SHOW TABLES LIKE '  T*  
'; {code} will list all the t-tables. 
# Sort the result.
Please use Hive to verify it.

  was:
LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
etc DDL. In the pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA:
# Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
replacement was scattered in several places; it is good to have one place to do 
the same thing;
# Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE '  T*  
'; {code} will list all the t-tables. 
# Sort the result.
Please use Hive to verify it.


> Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
> 
>
> Key: SPARK-14429
> URL: https://issues.apache.org/jira/browse/SPARK-14429
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Bo Meng
>Priority: Minor
>
> LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
> etc DDL. In the pattern, user can use `|` or `\*` as wildcards.
> I'd like to address a few issues in this JIRA:
> # Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
> replacement was scattered in several places; it is good to have one place to 
> do the same thing;
> # Consistency with Hive: the pattern is case insensitive in Hive and white 
> spaces will be trimmed, but current pattern matching does not do that. For 
> example, suppose we have tables t1, t2, t3, {code:SQL}SHOW TABLES LIKE '  T*  
> '; {code} will list all the t-tables. 
> # Sort the result.
> Please use Hive to verify it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL

2016-04-06 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14429:

Description: 
LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
etc DDL. In the pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA:
# Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
replacement was scattered in several places; it is good to have one place to do 
the same thing;
# Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE '  T*  
'; {code} will list all the t-tables. 
# Sort the result.
Please use Hive to verify it.

  was:
LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
etc DDL. In the pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA:
# Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
replacement was scattered in several places; it is good to have one place to do 
the same thing;
# Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE '  T*  
'; {code} will list all the t-tables. Please use Hive to verify it.


> Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
> 
>
> Key: SPARK-14429
> URL: https://issues.apache.org/jira/browse/SPARK-14429
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Bo Meng
>Priority: Minor
>
> LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
> etc DDL. In the pattern, user can use `|` or `\*` as wildcards.
> I'd like to address a few issues in this JIRA:
> # Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
> replacement was scattered in several places; it is good to have one place to 
> do the same thing;
> # Consistency with Hive: the pattern is case insensitive in Hive and white 
> spaces will be trimmed, but current pattern matching does not do that. For 
> example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE '  
> T*  '; {code} will list all the t-tables. 
> # Sort the result.
> Please use Hive to verify it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL

2016-04-06 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14429:

Description: 
LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
etc DDL. In the pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA.
# Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
replacement was scattered in several places; it is good to have one place to do 
the same thing;
# Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE '  T*  
'; {code} will list all the t-tables. Please use Hive to verify it.

  was:
LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
etc DDL. In the pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA.
1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
replacement was scattered in several places; it is good to have one place to do 
the same thing;
2. Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE '  T*  
'; {code} will list all the t-tables. Please use Hive to verify it.


> Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
> 
>
> Key: SPARK-14429
> URL: https://issues.apache.org/jira/browse/SPARK-14429
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Bo Meng
>Priority: Minor
>
> LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
> etc DDL. In the pattern, user can use `|` or `\*` as wildcards.
> I'd like to address a few issues in this JIRA.
> # Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
> replacement was scattered in several places; it is good to have one place to 
> do the same thing;
> # Consistency with Hive: the pattern is case insensitive in Hive and white 
> spaces will be trimmed, but current pattern matching does not do that. For 
> example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE '  
> T*  '; {code} will list all the t-tables. Please use Hive to verify it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL

2016-04-06 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14429:

Description: 
LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
etc DDL. In the pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA:
# Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
replacement was scattered in several places; it is good to have one place to do 
the same thing;
# Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE '  T*  
'; {code} will list all the t-tables. Please use Hive to verify it.

  was:
LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
etc DDL. In the pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA.
# Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
replacement was scattered in several places; it is good to have one place to do 
the same thing;
# Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE '  T*  
'; {code} will list all the t-tables. Please use Hive to verify it.


> Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
> 
>
> Key: SPARK-14429
> URL: https://issues.apache.org/jira/browse/SPARK-14429
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Bo Meng
>Priority: Minor
>
> LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
> etc DDL. In the pattern, user can use `|` or `\*` as wildcards.
> I'd like to address a few issues in this JIRA:
> # Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
> replacement was scattered in several places; it is good to have one place to 
> do the same thing;
> # Consistency with Hive: the pattern is case insensitive in Hive and white 
> spaces will be trimmed, but current pattern matching does not do that. For 
> example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE '  
> T*  '; {code} will list all the t-tables. Please use Hive to verify it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL

2016-04-06 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14429:

Description: 
LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
etc DDL. In the pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA.
1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
replacement was scattered in several places; it is good to have one place to do 
the same thing;
2. Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE '  T*  
' {code} will list all the t-tables. Please use Hive to verify it.

  was:
LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
etc DDL. In the pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA.
1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
replacement was scattered in several places; it is good to have one place to do 
the same thing;
2. Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables (t1, t2, t3), {panel}SHOW TABLES LIKE '  T*  ' 
{panel} will list all the t-tables. Please use Hive to verify it.


> Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
> 
>
> Key: SPARK-14429
> URL: https://issues.apache.org/jira/browse/SPARK-14429
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Bo Meng
>Priority: Minor
>
> LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
> etc DDL. In the pattern, user can use `|` or `\*` as wildcards.
> I'd like to address a few issues in this JIRA.
> 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
> replacement was scattered in several places; it is good to have one place to 
> do the same thing;
> 2. Consistency with Hive: the pattern is case insensitive in Hive and white 
> spaces will be trimmed, but current pattern matching does not do that. For 
> example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE '  
> T*  ' {code} will list all the t-tables. Please use Hive to verify it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL

2016-04-06 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14429:

Description: 
LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
etc DDL. In the pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA.
1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
replacement was scattered in several places; it is good to have one place to do 
the same thing;
2. Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables (t1, t2, t3), {panel}SHOW TABLES LIKE '  T*  ' 
{panel} will list all the t-tables. Please use Hive to verify it.

  was:
LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
etc DDL. In the pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA.
1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
replacement was scattered in several places; it is good to have one place to do 
the same thing;
2. Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables (t1, t2, t3), {panel}`SHOW TABLES LIKE '  T*  ' 
`{panel} will list all the t-tables. Please use Hive to verify it.


> Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
> 
>
> Key: SPARK-14429
> URL: https://issues.apache.org/jira/browse/SPARK-14429
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Bo Meng
>Priority: Minor
>
> LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
> etc DDL. In the pattern, user can use `|` or `\*` as wildcards.
> I'd like to address a few issues in this JIRA.
> 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
> replacement was scattered in several places; it is good to have one place to 
> do the same thing;
> 2. Consistency with Hive: the pattern is case insensitive in Hive and white 
> spaces will be trimmed, but current pattern matching does not do that. For 
> example, suppose we have tables (t1, t2, t3), {panel}SHOW TABLES LIKE '  T*  
> ' {panel} will list all the t-tables. Please use Hive to verify it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL

2016-04-06 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14429:

Description: 
LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
etc DDL. In the pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA.
1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
replacement was scattered in several places; it is good to have one place to do 
the same thing;
2. Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE '  T*  
'; {code} will list all the t-tables. Please use Hive to verify it.

  was:
LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
etc DDL. In the pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA.
1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
replacement was scattered in several places; it is good to have one place to do 
the same thing;
2. Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE '  T*  
' {code} will list all the t-tables. Please use Hive to verify it.


> Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
> 
>
> Key: SPARK-14429
> URL: https://issues.apache.org/jira/browse/SPARK-14429
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Bo Meng
>Priority: Minor
>
> LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
> etc DDL. In the pattern, user can use `|` or `\*` as wildcards.
> I'd like to address a few issues in this JIRA.
> 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
> replacement was scattered in several places; it is good to have one place to 
> do the same thing;
> 2. Consistency with Hive: the pattern is case insensitive in Hive and white 
> spaces will be trimmed, but current pattern matching does not do that. For 
> example, suppose we have tables (t1, t2, t3), {code:SQL}SHOW TABLES LIKE '  
> T*  '; {code} will list all the t-tables. Please use Hive to verify it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL

2016-04-06 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14429:

Description: 
LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
etc DDL. In the pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA.
1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
replacement was scattered in several places; it is good to have one place to do 
the same thing;
2. Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables (t1, t2, t3), {panel}`SHOW TABLES LIKE '  T*  ' 
`{panel} will list all the t-tables. Please use Hive to verify it.

  was:
LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
etc DDL. In the pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA.
1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
replacement was scattered in several places; it is good to have one place to do 
the same thing;
2. Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE '  T*  ' ` will 
list all the t-tables. Please use Hive to verify it.


> Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
> 
>
> Key: SPARK-14429
> URL: https://issues.apache.org/jira/browse/SPARK-14429
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Bo Meng
>Priority: Minor
>
> LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
> etc DDL. In the pattern, user can use `|` or `\*` as wildcards.
> I'd like to address a few issues in this JIRA.
> 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
> replacement was scattered in several places; it is good to have one place to 
> do the same thing;
> 2. Consistency with Hive: the pattern is case insensitive in Hive and white 
> spaces will be trimmed, but current pattern matching does not do that. For 
> example, suppose we have tables (t1, t2, t3), {panel}`SHOW TABLES LIKE '  T*  
> ' `{panel} will list all the t-tables. Please use Hive to verify it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL

2016-04-06 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14429:

Description: 
LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
etc DDL. In the pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA.
1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
replacement was scattered in several places; it is good to have one place to do 
the same thing;
2. Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE '  T*  ' ` will 
list all the t-tables. Please use Hive to verify it.

  was:
LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
etc DDL. In the pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA.
1. Currently, we used `replaceAll()` to replace {block}`\*` with `.\*`{block}, 
but the replacement was scattered in several places; it is good to have one 
place to do the same thing;
2. Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE '  T*  ' ` will 
list all the t-tables. Please use Hive to verify it.


> Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
> 
>
> Key: SPARK-14429
> URL: https://issues.apache.org/jira/browse/SPARK-14429
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Bo Meng
>Priority: Minor
>
> LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
> etc DDL. In the pattern, user can use `|` or `\*` as wildcards.
> I'd like to address a few issues in this JIRA.
> 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
> replacement was scattered in several places; it is good to have one place to 
> do the same thing;
> 2. Consistency with Hive: the pattern is case insensitive in Hive and white 
> spaces will be trimmed, but current pattern matching does not do that. For 
> example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE '  T*  ' ` 
> will list all the t-tables. Please use Hive to verify it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL

2016-04-06 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14429:

Description: 
LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
etc DDL. In the pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA.
1. Currently, we used `replaceAll()` to replace {block}`\*` with `.\*`{block}, 
but the replacement was scattered in several places; it is good to have one 
place to do the same thing;
2. Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE '  T*  ' ` will 
list all the t-tables. Please use Hive to verify it.

  was:
LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
etc DDL. In the pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA.
1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
replacement was scattered in several places; it is good to have one place to do 
the same thing;
2. Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE '  T*  ' ` will 
list all the t-tables. Please use Hive to verify it.


> Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
> 
>
> Key: SPARK-14429
> URL: https://issues.apache.org/jira/browse/SPARK-14429
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Bo Meng
>Priority: Minor
>
> LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
> etc DDL. In the pattern, user can use `|` or `\*` as wildcards.
> I'd like to address a few issues in this JIRA.
> 1. Currently, we used `replaceAll()` to replace {block}`\*` with 
> `.\*`{block}, but the replacement was scattered in several places; it is good 
> to have one place to do the same thing;
> 2. Consistency with Hive: the pattern is case insensitive in Hive and white 
> spaces will be trimmed, but current pattern matching does not do that. For 
> example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE '  T*  ' ` 
> will list all the t-tables. Please use Hive to verify it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL

2016-04-06 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14429:

Description: 
LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
etc DDL. In the pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA.
1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
replacement was scattered in several places; it is good to have one place to do 
the same thing;
2. Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE '  T*  ' ` will 
list all the t-tables. Please use Hive to verify it.

  was:
LIKE  is commonly used in SHOW TABLES / FUNCTIONS etc DDL. In the 
pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA.
1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
replacement was scattered in several places; it is good to have one place to do 
the same thing;
2. Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE '  T*  ' ` will 
list all the t-tables. Please use Hive to verify it.


> Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
> 
>
> Key: SPARK-14429
> URL: https://issues.apache.org/jira/browse/SPARK-14429
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Bo Meng
>Priority: Minor
>
> LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
> etc DDL. In the pattern, user can use `|` or `\*` as wildcards.
> I'd like to address a few issues in this JIRA.
> 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
> replacement was scattered in several places; it is good to have one place to 
> do the same thing;
> 2. Consistency with Hive: the pattern is case insensitive in Hive and white 
> spaces will be trimmed, but current pattern matching does not do that. For 
> example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE '  T*  ' ` 
> will list all the t-tables. Please use Hive to verify it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL

2016-04-06 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14429:

Description: 
LIKE  is commonly used in SHOW TABLES / FUNCTIONS etc DDL. In the 
pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA.
1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
replacement was scattered in several places; it is good to have one place to do 
the same thing;
2. Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE '  T*  ' ` will 
list all the t-tables. Please use Hive to verify it.

  was:
LIKE  is commonly used in SHOW TABLES / FUNCTIONS etc DDL. In the 
pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA.
1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
replacement was scattered in several places; it is good to have one place to do 
the same thing;
2. Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE '  T*  ' ` will 
list all the t-tables. Please use Hive to verify it.

I will post a PR shortly.


> Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
> 
>
> Key: SPARK-14429
> URL: https://issues.apache.org/jira/browse/SPARK-14429
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Bo Meng
>Priority: Minor
>
> LIKE  is commonly used in SHOW TABLES / FUNCTIONS etc DDL. In the 
> pattern, user can use `|` or `\*` as wildcards.
> I'd like to address a few issues in this JIRA.
> 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
> replacement was scattered in several places; it is good to have one place to 
> do the same thing;
> 2. Consistency with Hive: the pattern is case insensitive in Hive and white 
> spaces will be trimmed, but current pattern matching does not do that. For 
> example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE '  T*  ' ` 
> will list all the t-tables. Please use Hive to verify it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL

2016-04-06 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14429:

Summary: Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " 
DDL  (was: Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE ")

> Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
> 
>
> Key: SPARK-14429
> URL: https://issues.apache.org/jira/browse/SPARK-14429
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Bo Meng
>Priority: Minor
>
> LIKE  is commonly used in SHOW TABLES / FUNCTIONS etc DDL. In the 
> pattern, user can use `|` or `\*` as wildcards.
> I'd like to address a few issues in this JIRA.
> 1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
> replacement was scattered in several places; it is good to have one place to 
> do the same thing;
> 2. Consistency with Hive: the pattern is case insensitive in Hive and white 
> spaces will be trimmed, but current pattern matching does not do that. For 
> example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE '  T*  ' ` 
> will list all the t-tables. Please use Hive to verify it.
> I will post a PR shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14429) Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE "

2016-04-06 Thread Bo Meng (JIRA)

Bo Meng created SPARK-14429:
---

 Summary: Revisit LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE 
"
 Key: SPARK-14429
 URL: https://issues.apache.org/jira/browse/SPARK-14429
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Bo Meng
Priority: Minor


LIKE  is commonly used in SHOW TABLES / FUNCTIONS etc DDL. In the 
pattern, user can use `|` or `\*` as wildcards.

I'd like to address a few issues in this JIRA.
1. Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
replacement was scattered in several places; it is good to have one place to do 
the same thing;
2. Consistency with Hive: the pattern is case insensitive in Hive and white 
spaces will be trimmed, but current pattern matching does not do that. For 
example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE '  T*  ' ` will 
list all the t-tables. Please use Hive to verify it.

I will post a PR shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14398) Audit non-reserved keyword list in ANTLR4 parser.

2016-04-05 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225767#comment-15225767
 ] 

Bo Meng commented on SPARK-14398:
-

Not a problem. I will work on it tomorrow.  Thanks.

> Audit non-reserved keyword list in ANTLR4 parser.
> -
>
> Key: SPARK-14398
> URL: https://issues.apache.org/jira/browse/SPARK-14398
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Herman van Hovell
> Fix For: 2.0.0
>
>
> We need to check if all keywords that were non-reserved in the `old` ANTLR3 
> parser are non-reserved in the ANTLR4 parser. Notable exceptions are join 
> {{LEFT}}, {{RIGHT}} & {{FULL}} keywords; these used to be non-reserved and 
> are now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14383) Missing "|" in the g4 definition

2016-04-04 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14383:

Summary: Missing "|" in the g4 definition  (was: Missing | in the g4 
definition)

> Missing "|" in the g4 definition
> 
>
> Key: SPARK-14383
> URL: https://issues.apache.org/jira/browse/SPARK-14383
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Bo Meng
>Priority: Trivial
>
> It is really a trivial bug in the g4 file I found. It is missing a "|" 
> between DISTRIBUTE and UNSET. I will post the PR shortly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14383) Missing | in the g4 definition

2016-04-04 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14383:

Description: It is really a trivial bug in the g4 file I found. It is 
missing a "|" between DISTRIBUTE and UNSET. I will post the PR shortly.   (was: 
It is really a trivial bug in the g4 file I found. I will post the PR shortly. )

> Missing | in the g4 definition
> --
>
> Key: SPARK-14383
> URL: https://issues.apache.org/jira/browse/SPARK-14383
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Bo Meng
>Priority: Trivial
>
> It is really a trivial bug in the g4 file I found. It is missing a "|" 
> between DISTRIBUTE and UNSET. I will post the PR shortly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14383) Missing | in the g4 definition

2016-04-04 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14383:

Description: It is really a trivial bug in the g4 file I found. I will post 
the PR shortly. 

> Missing | in the g4 definition
> --
>
> Key: SPARK-14383
> URL: https://issues.apache.org/jira/browse/SPARK-14383
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Bo Meng
>Priority: Trivial
>
> It is really a trivial bug in the g4 file I found. I will post the PR 
> shortly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14383) Missing | in the g4 definition

2016-04-04 Thread Bo Meng (JIRA)

Bo Meng created SPARK-14383:
---

 Summary: Missing | in the g4 definition
 Key: SPARK-14383
 URL: https://issues.apache.org/jira/browse/SPARK-14383
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Bo Meng
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-14323) [SQL] SHOW FUNCTIONS did not work properly

2016-04-04 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng closed SPARK-14323.
---
Resolution: Won't Fix

> [SQL] SHOW FUNCTIONS did not work properly
> --
>
> Key: SPARK-14323
> URL: https://issues.apache.org/jira/browse/SPARK-14323
> Project: Spark
>  Issue Type: Bug
>Reporter: Bo Meng
>Priority: Minor
>
> Show Functions syntax can be found here:
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions
> When use "*" in the LIKE clause, it will not return the expected results. 
> This is because "\*" did not get escaped before passing to the regex. If we 
> do not escape "\*", for example, pattern "\*f\*", it will cause exception 
> (PatternSyntaxException, Dangling meta character) and thus return empty 
> result.
> try this: 
> val p = "\*f\*".r



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-14323) [SQL] SHOW FUNCTIONS did not work properly

2016-04-04 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225083#comment-15225083
 ] 

Bo Meng edited comment on SPARK-14323 at 4/4/16 9:26 PM:
-

I did a deep investigation of pattern matching for LIKE in show 
tables/functions. Here is what I found: Hive only supports \* and | as 
wildcards. Use ".\*" to replace "\*" is right. The only issue is 
ShowFunctions() in commands.scala currently did not use it, thus cause the test 
cases fail. By using listFunctions() in SessionCatalog, the problem should be 
resolved.
That will be happened in another SPARK-14123.


was (Author: bomeng):
I did a deep investigation of pattern matching for LIKE in show 
tables/functions. Here is what I found: Hive only supports * and | as 
wildcards. Use ".*" to replace "*" is right. The only issue is ShowFunctions() 
in commands.scala currently did not use it, thus cause the test cases fail. By 
using listFunctions() in SessionCatalog, the problem should be resolved.
That will be happened in another SPARK-14123.

> [SQL] SHOW FUNCTIONS did not work properly
> --
>
> Key: SPARK-14323
> URL: https://issues.apache.org/jira/browse/SPARK-14323
> Project: Spark
>  Issue Type: Bug
>Reporter: Bo Meng
>Priority: Minor
>
> Show Functions syntax can be found here:
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions
> When use "*" in the LIKE clause, it will not return the expected results. 
> This is because "\*" did not get escaped before passing to the regex. If we 
> do not escape "\*", for example, pattern "\*f\*", it will cause exception 
> (PatternSyntaxException, Dangling meta character) and thus return empty 
> result.
> try this: 
> val p = "\*f\*".r



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14323) [SQL] SHOW FUNCTIONS did not work properly

2016-04-04 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225083#comment-15225083
 ] 

Bo Meng commented on SPARK-14323:
-

I did a deep investigation of pattern matching for LIKE in show 
tables/functions. Here is what I found: Hive only supports * and | as 
wildcards. Use ".*" to replace "*" is right. The only issue is ShowFunctions() 
in commands.scala currently did not use it, thus cause the test cases fail. By 
using listFunctions() in SessionCatalog, the problem should be resolved.
That will be happened in another SPARK-14123.

> [SQL] SHOW FUNCTIONS did not work properly
> --
>
> Key: SPARK-14323
> URL: https://issues.apache.org/jira/browse/SPARK-14323
> Project: Spark
>  Issue Type: Bug
>Reporter: Bo Meng
>Priority: Minor
>
> Show Functions syntax can be found here:
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions
> When use "*" in the LIKE clause, it will not return the expected results. 
> This is because "\*" did not get escaped before passing to the regex. If we 
> do not escape "\*", for example, pattern "\*f\*", it will cause exception 
> (PatternSyntaxException, Dangling meta character) and thus return empty 
> result.
> try this: 
> val p = "\*f\*".r



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14283) Avoid sort in randomSplit when possible

2016-04-03 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223637#comment-15223637
 ] 

Bo Meng commented on SPARK-14283:
-

Could you please provide more details, such as test cases, use cases, etc.?

> Avoid sort in randomSplit when possible
> ---
>
> Key: SPARK-14283
> URL: https://issues.apache.org/jira/browse/SPARK-14283
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Joseph K. Bradley
>
> Dataset.randomSplit sorts each partition in order to guarantee an ordering 
> and make randomSplit deterministic given the seed.  Since randomSplit is used 
> a fair amount in ML, it would be great to avoid the sort when possible.
> Are there cases when it could be avoided?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14341) Throw exception on unsupported Create/Drop Macro DDL commands

2016-04-01 Thread Bo Meng (JIRA)

Bo Meng created SPARK-14341:
---

 Summary: Throw exception on unsupported Create/Drop Macro DDL 
commands
 Key: SPARK-14341
 URL: https://issues.apache.org/jira/browse/SPARK-14341
 Project: Spark
  Issue Type: Improvement
Reporter: Bo Meng
Priority: Minor


According to
[SPARK-14123|https://issues.apache.org/jira/browse/SPARK-14123], we need to 
throw exception for Create/Drop Macro DDL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14323) [SQL] SHOW FUNCTIONS did not work properly

2016-03-31 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14323:

Description: 
Show Functions syntax can be found here:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions

When use "*" in the LIKE clause, it will not return the expected results. 

This is because "*" did not get escaped before passing to the regex. If we do 
not escape "*", for example, pattern "*f*", it will cause exception 
(PatternSyntaxException, Dangling meta character) and thus return empty result.

try this: 
val p = "\*f\*".r

  was:
Show Functions syntax can be found here:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions

When use "*" in the LIKE clause, it will not return the expected results. 

This is because "*" did not get escaped before passing to the regex. If we do 
not escape "*", for example, pattern "*f*", it will cause exception 
(PatternSyntaxException, Dangling meta character) and thus return empty result.

try this: 
val p = "*f*".r


> [SQL] SHOW FUNCTIONS did not work properly
> --
>
> Key: SPARK-14323
> URL: https://issues.apache.org/jira/browse/SPARK-14323
> Project: Spark
>  Issue Type: Bug
>Reporter: Bo Meng
>Priority: Minor
>
> Show Functions syntax can be found here:
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions
> When use "*" in the LIKE clause, it will not return the expected results. 
> This is because "*" did not get escaped before passing to the regex. If we do 
> not escape "*", for example, pattern "*f*", it will cause exception 
> (PatternSyntaxException, Dangling meta character) and thus return empty 
> result.
> try this: 
> val p = "\*f\*".r



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14323) [SQL] SHOW FUNCTIONS did not work properly

2016-03-31 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14323:

Description: 
Show Functions syntax can be found here:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions

When use "*" in the LIKE clause, it will not return the expected results. 

This is because "*" did not get escaped before passing to the regex. If we do 
not escape "*", for example, pattern "*f*", it will cause exception 
(PatternSyntaxException, Dangling meta character) and thus return empty result.

try this: 
val p = "*f*".r

  was:
Show Functions syntax can be found here:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions

When use "*" in the LIKE clause, it will not return the expected results.

This is because "*" did not get escaped before passing to the regex.


> [SQL] SHOW FUNCTIONS did not work properly
> --
>
> Key: SPARK-14323
> URL: https://issues.apache.org/jira/browse/SPARK-14323
> Project: Spark
>  Issue Type: Bug
>Reporter: Bo Meng
>Priority: Minor
>
> Show Functions syntax can be found here:
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions
> When use "*" in the LIKE clause, it will not return the expected results. 
> This is because "*" did not get escaped before passing to the regex. If we do 
> not escape "*", for example, pattern "*f*", it will cause exception 
> (PatternSyntaxException, Dangling meta character) and thus return empty 
> result.
> try this: 
> val p = "*f*".r



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14323) [SQL] SHOW FUNCTIONS did not work properly

2016-03-31 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14323:

Description: 
Show Functions syntax can be found here:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions

When use "*" in the LIKE clause, it will not return the expected results. 

This is because "\*" did not get escaped before passing to the regex. If we do 
not escape "\*", for example, pattern "\*f\*", it will cause exception 
(PatternSyntaxException, Dangling meta character) and thus return empty result.

try this: 
val p = "\*f\*".r

  was:
Show Functions syntax can be found here:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions

When use "*" in the LIKE clause, it will not return the expected results. 

This is because "*" did not get escaped before passing to the regex. If we do 
not escape "*", for example, pattern "*f*", it will cause exception 
(PatternSyntaxException, Dangling meta character) and thus return empty result.

try this: 
val p = "\*f\*".r


> [SQL] SHOW FUNCTIONS did not work properly
> --
>
> Key: SPARK-14323
> URL: https://issues.apache.org/jira/browse/SPARK-14323
> Project: Spark
>  Issue Type: Bug
>Reporter: Bo Meng
>Priority: Minor
>
> Show Functions syntax can be found here:
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions
> When use "*" in the LIKE clause, it will not return the expected results. 
> This is because "\*" did not get escaped before passing to the regex. If we 
> do not escape "\*", for example, pattern "\*f\*", it will cause exception 
> (PatternSyntaxException, Dangling meta character) and thus return empty 
> result.
> try this: 
> val p = "\*f\*".r



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14323) [SQL] SHOW FUNCTIONS did not work properly

2016-03-31 Thread Bo Meng (JIRA)

Bo Meng created SPARK-14323:
---

 Summary: [SQL] SHOW FUNCTIONS did not work properly
 Key: SPARK-14323
 URL: https://issues.apache.org/jira/browse/SPARK-14323
 Project: Spark
  Issue Type: Bug
Reporter: Bo Meng
Priority: Minor


Show Functions syntax can be found here:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions

When use "*" in the LIKE clause, it will not return the expected results.

This is because "*" did not get escaped before passing to the regex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14294) Support native execution of ALTER TABLE ... RENAME TO

2016-03-31 Thread Bo Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng updated SPARK-14294:

Description: 
Support native execution of ALTER TABLE ... RENAME TO

The syntax for ALTER TABLE ... RENAME TO commands is described as following:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RenameTable

  was:
Support native execution of ALTER TABLE ... RENAME TO

The syntax for ALTER TABLE ... RENAME TO commands are described as following:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RenameTable


> Support native execution of ALTER TABLE ... RENAME TO
> -
>
> Key: SPARK-14294
> URL: https://issues.apache.org/jira/browse/SPARK-14294
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Bo Meng
>Priority: Minor
>
> Support native execution of ALTER TABLE ... RENAME TO
> The syntax for ALTER TABLE ... RENAME TO commands is described as following:
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RenameTable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14129) [Table related commands] Alter table

2016-03-31 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219883#comment-15219883
 ] 

Bo Meng commented on SPARK-14129:
-

This is for ALTER TABLE...RENAME TO.. I will be working on rest of them.

> [Table related commands] Alter table
> 
>
> Key: SPARK-14129
> URL: https://issues.apache.org/jira/browse/SPARK-14129
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>
> For alter table command, we have the following tokens. 
> TOK_ALTERTABLE_RENAME
> TOK_ALTERTABLE_LOCATION
> TOK_ALTERTABLE_PROPERTIES/TOK_ALTERTABLE_DROPPROPERTIES
> TOK_ALTERTABLE_SERIALIZER
> TOK_ALTERTABLE_SERDEPROPERTIES
> TOK_ALTERTABLE_CLUSTER_SORT
> TOK_ALTERTABLE_SKEWED
> For a data source table, let's implement TOK_ALTERTABLE_RENAME, 
> TOK_ALTERTABLE_LOCATION, and TOK_ALTERTABLE_SERDEPROPERTIES. We need to 
> decide what we do for 
> TOK_ALTERTABLE_PROPERTIES/TOK_ALTERTABLE_DROPPROPERTIES. It will be use to 
> allow users to correct the data format (e.g. changing csv to 
> com.databricks.spark.csv to allow the table be accessed by the older versions 
> of spark).
> For a Hive table, we should implement all commands supported by the data 
> source table and TOK_ALTERTABLE_PROPERTIES/TOK_ALTERTABLE_DROPPROPERTIES.
> For TOK_ALTERTABLE_CLUSTER_SORT and TOK_ALTERTABLE_SKEWED, we should throw 
> exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14294) Support native execution of ALTER TABLE ... RENAME TO

2016-03-31 Thread Bo Meng (JIRA)

Bo Meng created SPARK-14294:
---

 Summary: Support native execution of ALTER TABLE ... RENAME TO
 Key: SPARK-14294
 URL: https://issues.apache.org/jira/browse/SPARK-14294
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Bo Meng
Priority: Minor


Support native execution of ALTER TABLE ... RENAME TO

The syntax for ALTER TABLE ... RENAME TO commands are described as following:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RenameTable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 115 matches

Mail list logo