[jira] [Commented] (SPARK-12777) Dataset fields can't be Scala tuples
[ https://issues.apache.org/jira/browse/SPARK-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298586#comment-15298586 ] Vishnu Prasad commented on SPARK-12777: --- [~janstenpickle] Could you please recheck and change the status of the issue. > Dataset fields can't be Scala tuples > > > Key: SPARK-12777 > URL: https://issues.apache.org/jira/browse/SPARK-12777 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0, 1.6.1, 2.0.0 >Reporter: Chris Jansen > > Datasets can't seem to handle scala tuples as fields of case classes in > datasets. > {code} > Seq((1,2), (3,4)).toDS().show() //works > {code} > When including a tuple as a field, the code fails: > {code} > case class Test(v: (Int, Int)) > Seq(Test((1,2)), Test((3,4)).toDS().show //fails > {code} > {code} > UnresolvedException: : Invalid call to dataType on unresolved object, tree: > 'name (unresolved.scala:59) > > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:59) > > org.apache.spark.sql.catalyst.expressions.GetStructField.org$apache$spark$sql$catalyst$expressions$GetStructField$$field$lzycompute(complexTypeExtractors.scala:107) > > org.apache.spark.sql.catalyst.expressions.GetStructField.org$apache$spark$sql$catalyst$expressions$GetStructField$$field(complexTypeExtractors.scala:107) > > org.apache.spark.sql.catalyst.expressions.GetStructField$$anonfun$toString$1.apply(complexTypeExtractors.scala:111) > > org.apache.spark.sql.catalyst.expressions.GetStructField$$anonfun$toString$1.apply(complexTypeExtractors.scala:111) > > org.apache.spark.sql.catalyst.expressions.GetStructField.toString(complexTypeExtractors.scala:111) > > org.apache.spark.sql.catalyst.expressions.Expression.toString(Expression.scala:217) > > org.apache.spark.sql.catalyst.expressions.Expression.toString(Expression.scala:217) > > org.apache.spark.sql.catalyst.expressions.If.toString(conditionalExpressions.scala:76) > > org.apache.spark.sql.catalyst.expressions.Expression.toString(Expression.scala:217) > > org.apache.spark.sql.catalyst.expressions.Alias.toString(namedExpressions.scala:155) > > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$argString$1.apply(TreeNode.scala:385) > > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$argString$1.apply(TreeNode.scala:381) > org.apache.spark.sql.catalyst.trees.TreeNode.argString(TreeNode.scala:388) > org.apache.spark.sql.catalyst.trees.TreeNode.simpleString(TreeNode.scala:391) > > org.apache.spark.sql.catalyst.plans.QueryPlan.simpleString(QueryPlan.scala:172) > > org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:441) > org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:396) > > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$5.apply(RuleExecutor.scala:118) > > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$5.apply(RuleExecutor.scala:119) > org.apache.spark.Logging$class.logDebug(Logging.scala:62) > > org.apache.spark.sql.catalyst.rules.RuleExecutor.logDebug(RuleExecutor.scala:44) > > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:115) > > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:72) > > org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72) > > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.resolve(ExpressionEncoder.scala:253) > org.apache.spark.sql.Dataset.(Dataset.scala:78) > org.apache.spark.sql.Dataset.(Dataset.scala:89) > org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:507) > > org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:80) > {code} > When providing a type alias, the code fails in a different way: > {code} > type TwoInt = (Int, Int) > case class Test(v: TwoInt) > Seq(Test((1,2)), Test((3,4)).toDS().show //fails > {code} > {code} > NoSuchElementException: : head of empty list (ScalaReflection.scala:504) > > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$extractorFor$1.apply(ScalaReflection.scala:504) > > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$extractorFor$1.apply(ScalaReflection.scala:502) > > org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$extractorFor(ScalaReflection.scala:502) > > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$extractorFor$1.apply(ScalaReflection.scala:509) > >
[jira] [Commented] (SPARK-15467) Getting stack overflow when attempting to query a wide Dataset (>200 fields)
[ https://issues.apache.org/jira/browse/SPARK-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295439#comment-15295439 ] Vishnu Prasad commented on SPARK-15467: --- Could this be be related to https://issues.scala-lang.org/browse/SI-7296 There are two sub tasks that is still pending. > Getting stack overflow when attempting to query a wide Dataset (>200 fields) > > > Key: SPARK-15467 > URL: https://issues.apache.org/jira/browse/SPARK-15467 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Don Drake > > This can be duplicated in a spark-shell, I am running Spark 2.0.0-preview. > {code} > import spark.implicits._ > case class Wide( > val f0:String = "", > val f1:String = "", > val f2:String = "", > val f3:String = "", > val f4:String = "", > val f5:String = "", > val f6:String = "", > val f7:String = "", > val f8:String = "", > val f9:String = "", > val f10:String = "", > val f11:String = "", > val f12:String = "", > val f13:String = "", > val f14:String = "", > val f15:String = "", > val f16:String = "", > val f17:String = "", > val f18:String = "", > val f19:String = "", > val f20:String = "", > val f21:String = "", > val f22:String = "", > val f23:String = "", > val f24:String = "", > val f25:String = "", > val f26:String = "", > val f27:String = "", > val f28:String = "", > val f29:String = "", > val f30:String = "", > val f31:String = "", > val f32:String = "", > val f33:String = "", > val f34:String = "", > val f35:String = "", > val f36:String = "", > val f37:String = "", > val f38:String = "", > val f39:String = "", > val f40:String = "", > val f41:String = "", > val f42:String = "", > val f43:String = "", > val f44:String = "", > val f45:String = "", > val f46:String = "", > val f47:String = "", > val f48:String = "", > val f49:String = "", > val f50:String = "", > val f51:String = "", > val f52:String = "", > val f53:String = "", > val f54:String = "", > val f55:String = "", > val f56:String = "", > val f57:String = "", > val f58:String = "", > val f59:String = "", > val f60:String = "", > val f61:String = "", > val f62:String = "", > val f63:String = "", > val f64:String = "", > val f65:String = "", > val f66:String = "", > val f67:String = "", > val f68:String = "", > val f69:String = "", > val f70:String = "", > val f71:String = "", > val f72:String = "", > val f73:String = "", > val f74:String = "", > val f75:String = "", > val f76:String = "", > val f77:String = "", > val f78:String = "", > val f79:String = "", > val f80:String = "", > val f81:String = "", > val f82:String = "", > val f83:String = "", > val f84:String = "", > val f85:String = "", > val f86:String = "", > val f87:String = "", > val f88:String = "", > val f89:String = "", > val f90:String = "", > val f91:String = "", > val f92:String = "", > val f93:String = "", > val f94:String = "", > val f95:String = "", > val f96:String = "", > val f97:String = "", > val f98:String = "", > val f99:String = "", > val f100:String = "", > val f101:String = "", > val f102:String = "", > val f103:String = "", > val f104:String = "", > val f105:String = "", > val f106:String = "", > val f107:String = "", > val f108:String = "", > val f109:String = "", > val f110:String = "", > val f111:String = "", > val f112:String = "", > val f113:String = "", > val f114:String = "", > val f115:String = "", > val f116:String = "", > val f117:String = "", > val f118:String = "", > val f119:String = "", > val f120:String = "", > val f121:String = "", > val f122:String = "", > val f123:String = "", > val f124:String = "", > val f125:String = "", > val f126:String = "", > val f127:String = "", > val f128:String = "", > val f129:String = "", > val f130:String = "", > val f131:String = "", > val f132:String = "", > val f133:String = "", > val f134:String = "", > val f135:String = "", > val f136:String = "", > val f137:String = "", > val f138:String = "", > val f139:String = "", > val f140:String = "", > val f141:String = "", > val f142:String = "", > val f143:String = "", > val f144:String = "", > val f145:String = "", > val f146:String = "", > val f147:String = "", > val f148:String = "", > val f149:String = "", > val f150:String = "", > val f151:String = "", > val f152:String = "", > val f153:String = "", > val f154:String = "", > val f155:String = "", > val f156:String = "", > val f157:String = "", > val f158:String = "", > val f159:String = "", > val f160:String = "", > val f161:String = "", > val f162:String = "", > val f163:String = "", > val f164:String = "", > val f165:String = "", > val f166:String = "", > val f167:String = "", > val f168:String = "", > val f169:String = "", > val f170:String = "", > val f171:String = "", > val f172:String = "", > val f173:String = "", > val f174:String =
[jira] [Comment Edited] (SPARK-14653) Remove NumericParser and jackson dependency from mllib-local
[ https://issues.apache.org/jira/browse/SPARK-14653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251308#comment-15251308 ] Vishnu Prasad edited comment on SPARK-14653 at 4/21/16 12:53 PM: - Hey I've just written an example of a UDT for DenseVector I'm assumming this is what i should do to all the classes in mllib-local that uses NumericParser and jackson. If what I'm doing is right which package am I supposed to put the UDT's in. Currently I'm placing them at _package org.apache.spark.sql.types.udt_ https://github.com/vishnu667/spark/blob/SPARK-14653/sql/catalyst/src/main/scala/org/apache/spark/sql/types/udt/VectorUDT.scala was (Author: vishnu667): Hey I've just written an example of a UDT for DenseVector I'm assumming this is what i should do to all the classes in mllib-local that uses NumericParser and jackson. If what I'm doing is right which package am I supposed to put the UDT's in. {code:borderStyle=solid} class DenseVectorUDT extends UserDefinedType[DenseVector] { override def sqlType: DataType = ArrayType(DoubleType, containsNull = false) override def serialize(features: DenseVector): ArrayData = { new GenericArrayData(features.values) } override def userClass: Class[DenseVector] = classOf[DenseVector] override def deserialize(datum: Any): Vector = { datum match { case jValue: JValue => (jValue \ "type").extract[Int] match { case 0 => // sparse val size = (jValue \ "size").extract[Int] val indices = (jValue \ "indices").extract[Seq[Int]].toArray val values = (jValue \ "values").extract[Seq[Double]].toArray Vectors.sparse(size, indices, values).toDense case 1 => // dense val values = (jValue \ "values").extract[Seq[Double]].toArray Vectors.dense(values) case _ => throw new IllegalArgumentException(s"Cannot parse $json into a vector.") } } } } {code} > Remove NumericParser and jackson dependency from mllib-local > > > Key: SPARK-14653 > URL: https://issues.apache.org/jira/browse/SPARK-14653 > Project: Spark > Issue Type: Sub-task > Components: Build, ML >Reporter: Xiangrui Meng > > After SPARK-14549, we should remove NumericParser and jackson from > mllib-local, which were introduced very earlier and now replaced by UDTs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14653) Remove NumericParser and jackson dependency from mllib-local
[ https://issues.apache.org/jira/browse/SPARK-14653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251308#comment-15251308 ] Vishnu Prasad commented on SPARK-14653: --- Hey I've just written an example of a UDT for DenseVector I'm assumming this is what i should do to all the classes in mllib-local that uses NumericParser and jackson. If what I'm doing is right which package am I supposed to put the UDT's in. {code:borderStyle=solid} class DenseVectorUDT extends UserDefinedType[DenseVector] { override def sqlType: DataType = ArrayType(DoubleType, containsNull = false) override def serialize(features: DenseVector): ArrayData = { new GenericArrayData(features.values) } override def userClass: Class[DenseVector] = classOf[DenseVector] override def deserialize(datum: Any): Vector = { datum match { case jValue: JValue => (jValue \ "type").extract[Int] match { case 0 => // sparse val size = (jValue \ "size").extract[Int] val indices = (jValue \ "indices").extract[Seq[Int]].toArray val values = (jValue \ "values").extract[Seq[Double]].toArray Vectors.sparse(size, indices, values).toDense case 1 => // dense val values = (jValue \ "values").extract[Seq[Double]].toArray Vectors.dense(values) case _ => throw new IllegalArgumentException(s"Cannot parse $json into a vector.") } } } } {code} > Remove NumericParser and jackson dependency from mllib-local > > > Key: SPARK-14653 > URL: https://issues.apache.org/jira/browse/SPARK-14653 > Project: Spark > Issue Type: Sub-task > Components: Build, ML >Reporter: Xiangrui Meng > > After SPARK-14549, we should remove NumericParser and jackson from > mllib-local, which were introduced very earlier and now replaced by UDTs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14739) Vectors.parse doesn't handle dense vectors of size 0 and sparse vectros with no indices
[ https://issues.apache.org/jira/browse/SPARK-14739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249080#comment-15249080 ] Vishnu Prasad commented on SPARK-14739: --- I've merged your PR with your test fixes. Thank you > Vectors.parse doesn't handle dense vectors of size 0 and sparse vectros with > no indices > --- > > Key: SPARK-14739 > URL: https://issues.apache.org/jira/browse/SPARK-14739 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark >Affects Versions: 1.6.0, 2.0.0 >Reporter: Maciej Szymkiewicz > > DenseVector: > {code} > Vectors.parse(str(Vectors.dense([]))) > ## ValueErrorTraceback (most recent call last) > ## .. > ## ValueError: Unable to parse values from > {code} > SparseVector: > {code} > Vectors.parse(str(Vectors.sparse(5, [], []))) > ## ValueErrorTraceback (most recent call last) > ## ... > ## ValueError: Unable to parse indices from . > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org