[ https://issues.apache.org/jira/browse/SPARK-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241181#comment-15241181 ]
Michael ZieliĆski commented on SPARK-14632: ------------------------------------------- FYI, this bug was introduced only in 1.6.1 > randomSplit method fails on dataframes with maps in schema > ---------------------------------------------------------- > > Key: SPARK-14632 > URL: https://issues.apache.org/jira/browse/SPARK-14632 > Project: Spark > Issue Type: Bug > Affects Versions: 1.6.1 > Reporter: Stefano Costantini > > Applying the randomSplit method to a dataframe with at least one map in the > schema results in an exception > {noformat} > org.apache.spark.sql.AnalysisException: cannot resolve 'features ASC' due to > data type mismatch: cannot sort data type map<string,double>; > {noformat} > This bug can be reproduced as follows: > {code} > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > import sqlContext.implicits._ > import org.apache.spark.sql._ > import org.apache.spark.sql.types._ > val arr = Array(("user1", Map("f1" -> 1.0, "f2" -> 1.0)), ("user2", Map("f2" > -> 1.0, "f3" -> 1.0)), ("user3",Map("f1" -> 1.0, "f2" -> 1.0))) > val df = sc.parallelize(arr).toDF("user","features") > df.printSchema > val Array(split1, split2) = df.randomSplit(Array(0.7, 0.3), seed = 101L) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org