[jira] [Commented] (SPARK-12777) Dataset fields can't be Scala tuples

2016-05-24 Thread Vishnu Prasad (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298586#comment-15298586
 ] 

Vishnu Prasad commented on SPARK-12777:
---

[~janstenpickle] Could you please recheck and change the status of the issue.

> Dataset fields can't be Scala tuples
> 
>
> Key: SPARK-12777
> URL: https://issues.apache.org/jira/browse/SPARK-12777
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0, 1.6.1, 2.0.0
>Reporter: Chris Jansen
>
> Datasets can't seem to handle scala tuples as fields of case classes in 
> datasets.
> {code}
> Seq((1,2), (3,4)).toDS().show() //works
> {code}
> When including a tuple as a field, the code fails:
> {code}
> case class Test(v: (Int, Int))
> Seq(Test((1,2)), Test((3,4)).toDS().show //fails
> {code}
> {code}
>   UnresolvedException: : Invalid call to dataType on unresolved object, tree: 
> 'name  (unresolved.scala:59)
>  
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:59)
>  
> org.apache.spark.sql.catalyst.expressions.GetStructField.org$apache$spark$sql$catalyst$expressions$GetStructField$$field$lzycompute(complexTypeExtractors.scala:107)
>  
> org.apache.spark.sql.catalyst.expressions.GetStructField.org$apache$spark$sql$catalyst$expressions$GetStructField$$field(complexTypeExtractors.scala:107)
>  
> org.apache.spark.sql.catalyst.expressions.GetStructField$$anonfun$toString$1.apply(complexTypeExtractors.scala:111)
>  
> org.apache.spark.sql.catalyst.expressions.GetStructField$$anonfun$toString$1.apply(complexTypeExtractors.scala:111)
>  
> org.apache.spark.sql.catalyst.expressions.GetStructField.toString(complexTypeExtractors.scala:111)
>  
> org.apache.spark.sql.catalyst.expressions.Expression.toString(Expression.scala:217)
>  
> org.apache.spark.sql.catalyst.expressions.Expression.toString(Expression.scala:217)
>  
> org.apache.spark.sql.catalyst.expressions.If.toString(conditionalExpressions.scala:76)
>  
> org.apache.spark.sql.catalyst.expressions.Expression.toString(Expression.scala:217)
>  
> org.apache.spark.sql.catalyst.expressions.Alias.toString(namedExpressions.scala:155)
>  
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$argString$1.apply(TreeNode.scala:385)
>  
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$argString$1.apply(TreeNode.scala:381)
>  org.apache.spark.sql.catalyst.trees.TreeNode.argString(TreeNode.scala:388)
>  org.apache.spark.sql.catalyst.trees.TreeNode.simpleString(TreeNode.scala:391)
>  
> org.apache.spark.sql.catalyst.plans.QueryPlan.simpleString(QueryPlan.scala:172)
>  
> org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:441)
>  org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:396)
>  
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$5.apply(RuleExecutor.scala:118)
>  
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$5.apply(RuleExecutor.scala:119)
>  org.apache.spark.Logging$class.logDebug(Logging.scala:62)
>  
> org.apache.spark.sql.catalyst.rules.RuleExecutor.logDebug(RuleExecutor.scala:44)
>  
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:115)
>  
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:72)
>  
> org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72)
>  
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.resolve(ExpressionEncoder.scala:253)
>  org.apache.spark.sql.Dataset.(Dataset.scala:78)
>  org.apache.spark.sql.Dataset.(Dataset.scala:89)
>  org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:507)
>  
> org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:80)
> {code}
> When providing a type alias, the code fails in a different way:
> {code}
> type TwoInt = (Int, Int)
> case class Test(v: TwoInt)
> Seq(Test((1,2)), Test((3,4)).toDS().show //fails
> {code}
> {code}
>   NoSuchElementException: : head of empty list  (ScalaReflection.scala:504)
>  
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$extractorFor$1.apply(ScalaReflection.scala:504)
>  
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$extractorFor$1.apply(ScalaReflection.scala:502)
>  
> org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$extractorFor(ScalaReflection.scala:502)
>  
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$extractorFor$1.apply(ScalaReflection.scala:509)
>  
> 

[jira] [Commented] (SPARK-15467) Getting stack overflow when attempting to query a wide Dataset (>200 fields)

2016-05-22 Thread Vishnu Prasad (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295439#comment-15295439
 ] 

Vishnu Prasad commented on SPARK-15467:
---

Could this be be related to 
https://issues.scala-lang.org/browse/SI-7296 

There are two sub tasks that is still pending. 

> Getting stack overflow when attempting to query a wide Dataset (>200 fields)
> 
>
> Key: SPARK-15467
> URL: https://issues.apache.org/jira/browse/SPARK-15467
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Don Drake
>
> This can be duplicated in a spark-shell, I am running Spark 2.0.0-preview.
> {code}
> import spark.implicits._
> case class Wide(
> val f0:String = "",
> val f1:String = "",
> val f2:String = "",
> val f3:String = "",
> val f4:String = "",
> val f5:String = "",
> val f6:String = "",
> val f7:String = "",
> val f8:String = "",
> val f9:String = "",
> val f10:String = "",
> val f11:String = "",
> val f12:String = "",
> val f13:String = "",
> val f14:String = "",
> val f15:String = "",
> val f16:String = "",
> val f17:String = "",
> val f18:String = "",
> val f19:String = "",
> val f20:String = "",
> val f21:String = "",
> val f22:String = "",
> val f23:String = "",
> val f24:String = "",
> val f25:String = "",
> val f26:String = "",
> val f27:String = "",
> val f28:String = "",
> val f29:String = "",
> val f30:String = "",
> val f31:String = "",
> val f32:String = "",
> val f33:String = "",
> val f34:String = "",
> val f35:String = "",
> val f36:String = "",
> val f37:String = "",
> val f38:String = "",
> val f39:String = "",
> val f40:String = "",
> val f41:String = "",
> val f42:String = "",
> val f43:String = "",
> val f44:String = "",
> val f45:String = "",
> val f46:String = "",
> val f47:String = "",
> val f48:String = "",
> val f49:String = "",
> val f50:String = "",
> val f51:String = "",
> val f52:String = "",
> val f53:String = "",
> val f54:String = "",
> val f55:String = "",
> val f56:String = "",
> val f57:String = "",
> val f58:String = "",
> val f59:String = "",
> val f60:String = "",
> val f61:String = "",
> val f62:String = "",
> val f63:String = "",
> val f64:String = "",
> val f65:String = "",
> val f66:String = "",
> val f67:String = "",
> val f68:String = "",
> val f69:String = "",
> val f70:String = "",
> val f71:String = "",
> val f72:String = "",
> val f73:String = "",
> val f74:String = "",
> val f75:String = "",
> val f76:String = "",
> val f77:String = "",
> val f78:String = "",
> val f79:String = "",
> val f80:String = "",
> val f81:String = "",
> val f82:String = "",
> val f83:String = "",
> val f84:String = "",
> val f85:String = "",
> val f86:String = "",
> val f87:String = "",
> val f88:String = "",
> val f89:String = "",
> val f90:String = "",
> val f91:String = "",
> val f92:String = "",
> val f93:String = "",
> val f94:String = "",
> val f95:String = "",
> val f96:String = "",
> val f97:String = "",
> val f98:String = "",
> val f99:String = "",
> val f100:String = "",
> val f101:String = "",
> val f102:String = "",
> val f103:String = "",
> val f104:String = "",
> val f105:String = "",
> val f106:String = "",
> val f107:String = "",
> val f108:String = "",
> val f109:String = "",
> val f110:String = "",
> val f111:String = "",
> val f112:String = "",
> val f113:String = "",
> val f114:String = "",
> val f115:String = "",
> val f116:String = "",
> val f117:String = "",
> val f118:String = "",
> val f119:String = "",
> val f120:String = "",
> val f121:String = "",
> val f122:String = "",
> val f123:String = "",
> val f124:String = "",
> val f125:String = "",
> val f126:String = "",
> val f127:String = "",
> val f128:String = "",
> val f129:String = "",
> val f130:String = "",
> val f131:String = "",
> val f132:String = "",
> val f133:String = "",
> val f134:String = "",
> val f135:String = "",
> val f136:String = "",
> val f137:String = "",
> val f138:String = "",
> val f139:String = "",
> val f140:String = "",
> val f141:String = "",
> val f142:String = "",
> val f143:String = "",
> val f144:String = "",
> val f145:String = "",
> val f146:String = "",
> val f147:String = "",
> val f148:String = "",
> val f149:String = "",
> val f150:String = "",
> val f151:String = "",
> val f152:String = "",
> val f153:String = "",
> val f154:String = "",
> val f155:String = "",
> val f156:String = "",
> val f157:String = "",
> val f158:String = "",
> val f159:String = "",
> val f160:String = "",
> val f161:String = "",
> val f162:String = "",
> val f163:String = "",
> val f164:String = "",
> val f165:String = "",
> val f166:String = "",
> val f167:String = "",
> val f168:String = "",
> val f169:String = "",
> val f170:String = "",
> val f171:String = "",
> val f172:String = "",
> val f173:String = "",
> val f174:String = 

[jira] [Comment Edited] (SPARK-14653) Remove NumericParser and jackson dependency from mllib-local

2016-04-21 Thread Vishnu Prasad (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251308#comment-15251308
 ] 

Vishnu Prasad edited comment on SPARK-14653 at 4/21/16 12:53 PM:
-

Hey I've just written an example of a UDT for DenseVector I'm assumming this is 
what i should do to all the classes in mllib-local that uses NumericParser and 
jackson.

If what I'm doing is right which package am I supposed to put the UDT's in. 
Currently I'm placing them at _package org.apache.spark.sql.types.udt_

https://github.com/vishnu667/spark/blob/SPARK-14653/sql/catalyst/src/main/scala/org/apache/spark/sql/types/udt/VectorUDT.scala


was (Author: vishnu667):
Hey I've just written an example of a UDT for DenseVector I'm assumming this is 
what i should do to all the classes in mllib-local that uses NumericParser and 
jackson.

If what I'm doing is right which package am I supposed to put the UDT's in.

{code:borderStyle=solid}
class DenseVectorUDT extends UserDefinedType[DenseVector] {
  override def sqlType: DataType = ArrayType(DoubleType, containsNull = false)
  override def serialize(features: DenseVector): ArrayData = {
new GenericArrayData(features.values)
  }
  override def userClass: Class[DenseVector] = classOf[DenseVector]
  override def deserialize(datum: Any): Vector = {
datum match {
  case jValue: JValue => (jValue \ "type").extract[Int] match {
case 0 => // sparse
  val size = (jValue \ "size").extract[Int]
  val indices = (jValue \ "indices").extract[Seq[Int]].toArray
  val values = (jValue \ "values").extract[Seq[Double]].toArray
  Vectors.sparse(size, indices, values).toDense
case 1 => // dense
  val values = (jValue \ "values").extract[Seq[Double]].toArray
  Vectors.dense(values)
case _ =>
  throw new IllegalArgumentException(s"Cannot parse $json into a 
vector.")
  }
}
  }
}
{code}

> Remove NumericParser and jackson dependency from mllib-local
> 
>
> Key: SPARK-14653
> URL: https://issues.apache.org/jira/browse/SPARK-14653
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, ML
>Reporter: Xiangrui Meng
>
> After SPARK-14549, we should remove NumericParser and jackson from 
> mllib-local, which were introduced very earlier and now replaced by UDTs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14653) Remove NumericParser and jackson dependency from mllib-local

2016-04-20 Thread Vishnu Prasad (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251308#comment-15251308
 ] 

Vishnu Prasad commented on SPARK-14653:
---

Hey I've just written an example of a UDT for DenseVector I'm assumming this is 
what i should do to all the classes in mllib-local that uses NumericParser and 
jackson.

If what I'm doing is right which package am I supposed to put the UDT's in.

{code:borderStyle=solid}
class DenseVectorUDT extends UserDefinedType[DenseVector] {
  override def sqlType: DataType = ArrayType(DoubleType, containsNull = false)
  override def serialize(features: DenseVector): ArrayData = {
new GenericArrayData(features.values)
  }
  override def userClass: Class[DenseVector] = classOf[DenseVector]
  override def deserialize(datum: Any): Vector = {
datum match {
  case jValue: JValue => (jValue \ "type").extract[Int] match {
case 0 => // sparse
  val size = (jValue \ "size").extract[Int]
  val indices = (jValue \ "indices").extract[Seq[Int]].toArray
  val values = (jValue \ "values").extract[Seq[Double]].toArray
  Vectors.sparse(size, indices, values).toDense
case 1 => // dense
  val values = (jValue \ "values").extract[Seq[Double]].toArray
  Vectors.dense(values)
case _ =>
  throw new IllegalArgumentException(s"Cannot parse $json into a 
vector.")
  }
}
  }
}
{code}

> Remove NumericParser and jackson dependency from mllib-local
> 
>
> Key: SPARK-14653
> URL: https://issues.apache.org/jira/browse/SPARK-14653
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, ML
>Reporter: Xiangrui Meng
>
> After SPARK-14549, we should remove NumericParser and jackson from 
> mllib-local, which were introduced very earlier and now replaced by UDTs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14739) Vectors.parse doesn't handle dense vectors of size 0 and sparse vectros with no indices

2016-04-19 Thread Vishnu Prasad (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249080#comment-15249080
 ] 

Vishnu Prasad commented on SPARK-14739:
---

I've merged your PR with your test fixes. Thank you

> Vectors.parse doesn't handle dense vectors of size 0 and sparse vectros with 
> no indices
> ---
>
> Key: SPARK-14739
> URL: https://issues.apache.org/jira/browse/SPARK-14739
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 1.6.0, 2.0.0
>Reporter: Maciej Szymkiewicz
>
> DenseVector:
> {code}
> Vectors.parse(str(Vectors.dense([])))
> ## ValueErrorTraceback (most recent call last)
> ## .. 
> ## ValueError: Unable to parse values from
> {code}
> SparseVector:
> {code}
> Vectors.parse(str(Vectors.sparse(5, [], [])))
> ## ValueErrorTraceback (most recent call last)
> ##  ... 
> ## ValueError: Unable to parse indices from .
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org