Rakesh Partapsing created SPARK-27758: -----------------------------------------
Summary: Features won't generate after 1M rows Key: SPARK-27758 URL: https://issues.apache.org/jira/browse/SPARK-27758 Project: Spark Issue Type: Bug Components: Input/Output Affects Versions: 2.1.0 Reporter: Rakesh Partapsing I am trying to fit a huge dataset with ALS. The model I use: val als = new ALS() .setImplicitPrefs(true) .setNonnegative(true) .setUserCol("userIndex") .setItemCol("itemIndex") .setRatingCol("count") .setMaxIter(20) .setRank(40) .setRegParam(0.5) .setNumUserBlocks(20) .setNumItemBlocks(20) .setAlpha(5) val alsModel = als.fit(data) Now I see data if the user or itemindex has more than 1M rows, features will not be calculated for this user/itemId. Nor an error is returned. Is this a know issue for spark 2.1.0? So what I do now is randomSplit my data in like 4 batches, process each batch through ALS and then average each feature element from the 4 batches. Is this a valid approach? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org