[jira] [Updated] (SPARK-4959) Attributes are case sensitive when using a select query from a projection

2014-12-24 Thread Andy Konwinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Konwinski updated SPARK-4959:
--
Description: 
Per [~marmbrus], see this line of code, where we should be using an attribute 
map
 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L147

To reproduce, i ran the following in the Spark shell:

{code}
import sqlContext._
sql("drop table if exists test")
sql("create table test (col1 string)")
sql("""insert into table test select "hi" from prejoined limit 1""")
val projection = "col1".attr.as(Symbol("CaseSensitiveColName")) :: 
"col1".attr.as(Symbol("CaseSensitiveColName2")) :: Nil
sqlContext.table("test").select(projection:_*).registerTempTable("test2")

# This succeeds.
sql("select CaseSensitiveColName from test2").first()

# This fails with java.util.NoSuchElementException: key not found: 
casesensitivecolname#23046
sql("select casesensitivecolname from test2").first()
{code}

The full stack trace printed for the final command that is failing: 
{code}
java.util.NoSuchElementException: key not found: casesensitivecolname#23046
at scala.collection.MapLike$class.default(MapLike.scala:228)
at 
org.apache.spark.sql.catalyst.expressions.AttributeMap.default(AttributeMap.scala:29)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at 
org.apache.spark.sql.catalyst.expressions.AttributeMap.apply(AttributeMap.scala:29)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at 
org.apache.spark.sql.hive.execution.HiveTableScan.(HiveTableScan.scala:57)
at 
org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
at 
org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
at 
org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:378)
at 
org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:217)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
at 
org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:285)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)
at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)
at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)
at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:446)
at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:108)
at org.apache.spark.rdd.RDD.first(RDD.scala:1093)
{code}

  was:
Per [~marmbrus], see this line of code, where we should be using an attribute 
map
 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L147

To reproduce, i ran the following in the Spark shell:

{code}
sql("drop table if exists test")
sql("create table test (col1 string)")
sql("""insert into table test select "hi" from prejoined limit 1""")
import sqlContext._
val projection = "col1".attr.as(Symbol("CaseSensitiveColName")) :: 
"col1".attr.as(Symbol("CaseSensitiveColName2")) :: Nil
sqlContext.table("test").select(projection:_*).registerTempTable("test2")

# This succeeds.
sql("select CaseSensitiveColName from test2").first()

# This fails with java.util.NoSuchElementException: key not found: 
case

[jira] [Updated] (SPARK-4959) Attributes are case sensitive when using a select query from a projection

2014-12-24 Thread Andy Konwinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Konwinski updated SPARK-4959:
--
Description: 
Per [~marmbrus], see this line of code, where we should be using an attribute 
map
 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L147

To reproduce, i ran the following in the Spark shell:

{code}
sql("drop table if exists test")
sql("create table test (col1 string)")
sql("""insert into table test select "hi" from prejoined limit 1""")
import sqlContext._
val projection = "col1".attr.as(Symbol("CaseSensitiveColName")) :: 
"col1".attr.as(Symbol("CaseSensitiveColName2")) :: Nil
sqlContext.table("test").select(projection:_*).registerTempTable("test2")

# This succeeds.
sql("select CaseSensitiveColName from test2").first()

# This fails with java.util.NoSuchElementException: key not found: 
casesensitivecolname#23046
sql("select casesensitivecolname from test2").first()
{code}

The full stack trace printed for the final command that is failing: 
{code}
java.util.NoSuchElementException: key not found: casesensitivecolname#23046
at scala.collection.MapLike$class.default(MapLike.scala:228)
at 
org.apache.spark.sql.catalyst.expressions.AttributeMap.default(AttributeMap.scala:29)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at 
org.apache.spark.sql.catalyst.expressions.AttributeMap.apply(AttributeMap.scala:29)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at 
org.apache.spark.sql.hive.execution.HiveTableScan.(HiveTableScan.scala:57)
at 
org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
at 
org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
at 
org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:378)
at 
org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:217)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
at 
org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:285)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)
at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)
at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)
at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:446)
at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:108)
at org.apache.spark.rdd.RDD.first(RDD.scala:1093)
{code}

  was:
To reproduce, i ran the following in the Spark shell:

{code}
sql("drop table if exists test")
sql("create table test (col1 string)")
sql("""insert into table test select "hi" from prejoined limit 1""")
import sqlContext._
val projection = "col1".attr.as(Symbol("CaseSensitiveColName")) :: 
"col1".attr.as(Symbol("CaseSensitiveColName2")) :: Nil
sqlContext.table("test").select(projection:_*).registerTempTable("test2")

# This succeeds.
sql("select CaseSensitiveColName from test2").first()

# This fails with java.util.NoSuchElementException: key not found: 
casesensitivecolname#23046
sql("select casesensitivecolname from test2").first()
{code}

The full stack trace printed for the final command that is failing: 
{code}
java.util.NoSuchElementException: key not found: casesensi

[jira] [Created] (SPARK-4959) Attributes are case sensitive when using a select query from a projection

2014-12-24 Thread Andy Konwinski (JIRA)
Andy Konwinski created SPARK-4959:
-

 Summary: Attributes are case sensitive when using a select query 
from a projection
 Key: SPARK-4959
 URL: https://issues.apache.org/jira/browse/SPARK-4959
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Andy Konwinski


To reproduce, i ran the following in the Spark shell:

{code}
sql("drop table if exists test")
sql("create table test (col1 string)")
sql("""insert into table test select "hi" from prejoined limit 1""")
import sqlContext._
val projection = "col1".attr.as(Symbol("CaseSensitiveColName")) :: 
"col1".attr.as(Symbol("CaseSensitiveColName2")) :: Nil
sqlContext.table("test").select(projection:_*).registerTempTable("test2")

# This succeeds.
sql("select CaseSensitiveColName from test2").first()

# This fails with java.util.NoSuchElementException: key not found: 
casesensitivecolname#23046
sql("select casesensitivecolname from test2").first()
{code}

The full stack trace printed for the final command that is failing: 
{code}
java.util.NoSuchElementException: key not found: casesensitivecolname#23046
at scala.collection.MapLike$class.default(MapLike.scala:228)
at 
org.apache.spark.sql.catalyst.expressions.AttributeMap.default(AttributeMap.scala:29)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at 
org.apache.spark.sql.catalyst.expressions.AttributeMap.apply(AttributeMap.scala:29)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at 
org.apache.spark.sql.hive.execution.HiveTableScan.(HiveTableScan.scala:57)
at 
org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
at 
org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
at 
org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:378)
at 
org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:217)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
at 
org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:285)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)
at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)
at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)
at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:446)
at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:108)
at org.apache.spark.rdd.RDD.first(RDD.scala:1093)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3184) Allow user to specify num tasks to use for a table

2014-08-22 Thread Andy Konwinski (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107748#comment-14107748
 ] 

Andy Konwinski commented on SPARK-3184:
---

[~marmbrus], did we figure out if this feature is in fact missing right now?

> Allow user to specify num tasks to use for a table
> --
>
> Key: SPARK-3184
> URL: https://issues.apache.org/jira/browse/SPARK-3184
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Andy Konwinski
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3184) Allow user to specify num tasks to use for a table

2014-08-22 Thread Andy Konwinski (JIRA)
Andy Konwinski created SPARK-3184:
-

 Summary: Allow user to specify num tasks to use for a table
 Key: SPARK-3184
 URL: https://issues.apache.org/jira/browse/SPARK-3184
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Andy Konwinski






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1882) Support dynamic memory sharing in Mesos

2014-05-20 Thread Andy Konwinski (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003893#comment-14003893
 ] 

Andy Konwinski commented on SPARK-1882:
---

It seems like the problem is with heterogeneous environments (machines with 
different memory/cpu ratios).

One idea is to change from using a single value that is required/used by each 
Spark executor to using a bit of conditional logic (e.g. if accepting a partial 
slot would leave the machine with less than XGB men just accept all memory in 
the offer, else, accept default_slot_mem_size) so that you could have a range 
of values that would work, this could help to reduce fragmentation.

Also, I'm not sure if Mesos will tell you in a resource offer how much total 
memory the machine contains (in addition to how much is currently being offered 
from that machine), but I'm pretty sure you can get access to that value from 
Mesos some how. You could also use that value somehow when deciding to accept 
resources (to lower chance of fragmentation).

> Support dynamic memory sharing in Mesos
> ---
>
> Key: SPARK-1882
> URL: https://issues.apache.org/jira/browse/SPARK-1882
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 1.0.0
>Reporter: Andrew Ash
>
> Fine grained mode Mesos currently supports sharing CPUs very well, but 
> requires that memory be pre-partitioned according to the executor memory 
> parameter.  Mesos supports dynamic memory allocation in addition to dynamic 
> CPU allocation, so we should utilize this feature in Spark.
> See below where when the Mesos backend accepts a resource offer it only 
> checks that there's enough memory to cover sc.executorMemory, and doesn't 
> ever take a fraction of the memory available.  The memory offer is accepted 
> all or nothing from a pre-defined parameter.
> Coarse mode:
> https://github.com/apache/spark/blob/3ce526b168050c572a1feee8e0121e1426f7d9ee/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala#L208
> Fine mode:
> https://github.com/apache/spark/blob/a5150d199ca97ab2992bc2bb221a3ebf3d3450ba/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala#L114



--
This message was sent by Atlassian JIRA
(v6.2#6252)