Hi Nishanth, Just found out where you work:) We had some discussion in https://issues.apache.org/jira/browse/SPARK-2465 . Having long IDs will increase the communication cost, which may not worth the benefit. Not many companies have more than 1 billion users. If they do, maybe they can mirror the implementation for their use cases. I can suggest several possible solutions:
1. Hash user IDs into integers before training. If the collision rate is high and it is crucial for your business, you can recompute user features from product features by solving least squares after training. This works when the product IDs could be mapped to integers. 2. Make type aliases in ALS, so that you can easily mirror the implementation to use long IDs and track future changes. 3. Make ALS implementation use generic ID types. This would be the best solution, but it requires some refactoring of the code. Best, Xiangrui On Wed, Jan 14, 2015 at 1:04 PM, Nishanth P S <nishant...@gmail.com> wrote: > Yes, we are close to having more 2 billion users. In this case what is the > best way to handle this. > > Thanks, > Nishanth > > On Fri, Jan 9, 2015 at 9:50 PM, Xiangrui Meng <men...@gmail.com> wrote: >> >> Do you have more than 2 billion users/products? If not, you can pair >> each user/product id with an integer (check RDD.zipWithUniqueId), use >> them in ALS, and then join the original bigInt IDs back after >> training. -Xiangrui >> >> On Fri, Jan 9, 2015 at 5:12 PM, nishanthps <nishant...@gmail.com> wrote: >> > Hi, >> > >> > The userId's and productId's in my data are bigInts, what is the best >> > way to >> > run collaborative filtering on this data. Should I modify MLlib's >> > implementation to support more types? or is there an easy way. >> > >> > Thanks!, >> > Nishanth >> > >> > >> > >> > -- >> > View this message in context: >> > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-BigInteger-for-userId-and-productId-in-collaborative-Filtering-tp21072.html >> > Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> > For additional commands, e-mail: user-h...@spark.apache.org >> > > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org