[jira] [Updated] (SPARK-4959) Attributes are case sensitive when using a select query from a projection
[ https://issues.apache.org/jira/browse/SPARK-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Konwinski updated SPARK-4959: -- Description: Per [~marmbrus], see this line of code, where we should be using an attribute map https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L147 To reproduce, i ran the following in the Spark shell: {code} import sqlContext._ sql("drop table if exists test") sql("create table test (col1 string)") sql("""insert into table test select "hi" from prejoined limit 1""") val projection = "col1".attr.as(Symbol("CaseSensitiveColName")) :: "col1".attr.as(Symbol("CaseSensitiveColName2")) :: Nil sqlContext.table("test").select(projection:_*).registerTempTable("test2") # This succeeds. sql("select CaseSensitiveColName from test2").first() # This fails with java.util.NoSuchElementException: key not found: casesensitivecolname#23046 sql("select casesensitivecolname from test2").first() {code} The full stack trace printed for the final command that is failing: {code} java.util.NoSuchElementException: key not found: casesensitivecolname#23046 at scala.collection.MapLike$class.default(MapLike.scala:228) at org.apache.spark.sql.catalyst.expressions.AttributeMap.default(AttributeMap.scala:29) at scala.collection.MapLike$class.apply(MapLike.scala:141) at org.apache.spark.sql.catalyst.expressions.AttributeMap.apply(AttributeMap.scala:29) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.hive.execution.HiveTableScan.(HiveTableScan.scala:57) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221) at org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:378) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:217) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:285) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422) at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444) at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:446) at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:108) at org.apache.spark.rdd.RDD.first(RDD.scala:1093) {code} was: Per [~marmbrus], see this line of code, where we should be using an attribute map https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L147 To reproduce, i ran the following in the Spark shell: {code} sql("drop table if exists test") sql("create table test (col1 string)") sql("""insert into table test select "hi" from prejoined limit 1""") import sqlContext._ val projection = "col1".attr.as(Symbol("CaseSensitiveColName")) :: "col1".attr.as(Symbol("CaseSensitiveColName2")) :: Nil sqlContext.table("test").select(projection:_*).registerTempTable("test2") # This succeeds. sql("select CaseSensitiveColName from test2").first() # This fails with java.util.NoSuchElementException: key not found: case
[jira] [Updated] (SPARK-4959) Attributes are case sensitive when using a select query from a projection
[ https://issues.apache.org/jira/browse/SPARK-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Konwinski updated SPARK-4959: -- Description: Per [~marmbrus], see this line of code, where we should be using an attribute map https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L147 To reproduce, i ran the following in the Spark shell: {code} sql("drop table if exists test") sql("create table test (col1 string)") sql("""insert into table test select "hi" from prejoined limit 1""") import sqlContext._ val projection = "col1".attr.as(Symbol("CaseSensitiveColName")) :: "col1".attr.as(Symbol("CaseSensitiveColName2")) :: Nil sqlContext.table("test").select(projection:_*).registerTempTable("test2") # This succeeds. sql("select CaseSensitiveColName from test2").first() # This fails with java.util.NoSuchElementException: key not found: casesensitivecolname#23046 sql("select casesensitivecolname from test2").first() {code} The full stack trace printed for the final command that is failing: {code} java.util.NoSuchElementException: key not found: casesensitivecolname#23046 at scala.collection.MapLike$class.default(MapLike.scala:228) at org.apache.spark.sql.catalyst.expressions.AttributeMap.default(AttributeMap.scala:29) at scala.collection.MapLike$class.apply(MapLike.scala:141) at org.apache.spark.sql.catalyst.expressions.AttributeMap.apply(AttributeMap.scala:29) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.hive.execution.HiveTableScan.(HiveTableScan.scala:57) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221) at org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:378) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:217) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:285) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422) at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444) at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:446) at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:108) at org.apache.spark.rdd.RDD.first(RDD.scala:1093) {code} was: To reproduce, i ran the following in the Spark shell: {code} sql("drop table if exists test") sql("create table test (col1 string)") sql("""insert into table test select "hi" from prejoined limit 1""") import sqlContext._ val projection = "col1".attr.as(Symbol("CaseSensitiveColName")) :: "col1".attr.as(Symbol("CaseSensitiveColName2")) :: Nil sqlContext.table("test").select(projection:_*).registerTempTable("test2") # This succeeds. sql("select CaseSensitiveColName from test2").first() # This fails with java.util.NoSuchElementException: key not found: casesensitivecolname#23046 sql("select casesensitivecolname from test2").first() {code} The full stack trace printed for the final command that is failing: {code} java.util.NoSuchElementException: key not found: casesensi
[jira] [Created] (SPARK-4959) Attributes are case sensitive when using a select query from a projection
Andy Konwinski created SPARK-4959: - Summary: Attributes are case sensitive when using a select query from a projection Key: SPARK-4959 URL: https://issues.apache.org/jira/browse/SPARK-4959 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.0 Reporter: Andy Konwinski To reproduce, i ran the following in the Spark shell: {code} sql("drop table if exists test") sql("create table test (col1 string)") sql("""insert into table test select "hi" from prejoined limit 1""") import sqlContext._ val projection = "col1".attr.as(Symbol("CaseSensitiveColName")) :: "col1".attr.as(Symbol("CaseSensitiveColName2")) :: Nil sqlContext.table("test").select(projection:_*).registerTempTable("test2") # This succeeds. sql("select CaseSensitiveColName from test2").first() # This fails with java.util.NoSuchElementException: key not found: casesensitivecolname#23046 sql("select casesensitivecolname from test2").first() {code} The full stack trace printed for the final command that is failing: {code} java.util.NoSuchElementException: key not found: casesensitivecolname#23046 at scala.collection.MapLike$class.default(MapLike.scala:228) at org.apache.spark.sql.catalyst.expressions.AttributeMap.default(AttributeMap.scala:29) at scala.collection.MapLike$class.apply(MapLike.scala:141) at org.apache.spark.sql.catalyst.expressions.AttributeMap.apply(AttributeMap.scala:29) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.hive.execution.HiveTableScan.(HiveTableScan.scala:57) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221) at org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:378) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:217) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:285) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422) at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444) at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:446) at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:108) at org.apache.spark.rdd.RDD.first(RDD.scala:1093) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3184) Allow user to specify num tasks to use for a table
[ https://issues.apache.org/jira/browse/SPARK-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107748#comment-14107748 ] Andy Konwinski commented on SPARK-3184: --- [~marmbrus], did we figure out if this feature is in fact missing right now? > Allow user to specify num tasks to use for a table > -- > > Key: SPARK-3184 > URL: https://issues.apache.org/jira/browse/SPARK-3184 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Andy Konwinski > -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3184) Allow user to specify num tasks to use for a table
Andy Konwinski created SPARK-3184: - Summary: Allow user to specify num tasks to use for a table Key: SPARK-3184 URL: https://issues.apache.org/jira/browse/SPARK-3184 Project: Spark Issue Type: Improvement Components: SQL Reporter: Andy Konwinski -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1882) Support dynamic memory sharing in Mesos
[ https://issues.apache.org/jira/browse/SPARK-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003893#comment-14003893 ] Andy Konwinski commented on SPARK-1882: --- It seems like the problem is with heterogeneous environments (machines with different memory/cpu ratios). One idea is to change from using a single value that is required/used by each Spark executor to using a bit of conditional logic (e.g. if accepting a partial slot would leave the machine with less than XGB men just accept all memory in the offer, else, accept default_slot_mem_size) so that you could have a range of values that would work, this could help to reduce fragmentation. Also, I'm not sure if Mesos will tell you in a resource offer how much total memory the machine contains (in addition to how much is currently being offered from that machine), but I'm pretty sure you can get access to that value from Mesos some how. You could also use that value somehow when deciding to accept resources (to lower chance of fragmentation). > Support dynamic memory sharing in Mesos > --- > > Key: SPARK-1882 > URL: https://issues.apache.org/jira/browse/SPARK-1882 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 1.0.0 >Reporter: Andrew Ash > > Fine grained mode Mesos currently supports sharing CPUs very well, but > requires that memory be pre-partitioned according to the executor memory > parameter. Mesos supports dynamic memory allocation in addition to dynamic > CPU allocation, so we should utilize this feature in Spark. > See below where when the Mesos backend accepts a resource offer it only > checks that there's enough memory to cover sc.executorMemory, and doesn't > ever take a fraction of the memory available. The memory offer is accepted > all or nothing from a pre-defined parameter. > Coarse mode: > https://github.com/apache/spark/blob/3ce526b168050c572a1feee8e0121e1426f7d9ee/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala#L208 > Fine mode: > https://github.com/apache/spark/blob/a5150d199ca97ab2992bc2bb221a3ebf3d3450ba/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala#L114 -- This message was sent by Atlassian JIRA (v6.2#6252)