bq. had a hard time setting it up Mind sharing your experience in more detail :-) If you already have a hadoop cluster, it should be relatively straight forward to setup.
Tuning needs extra effort. On Thu, Feb 4, 2016 at 12:58 PM, habitats <m...@habitats.no> wrote: > Hello > > I have ~5 million text documents, each around 10-15KB in size, and split > into ~15 columns. I intend to do machine learning, and thus I need to > extract all of the data at the same time, and potentially update everything > on every run. > > So far I've just used json serializing, or simply cached the RDD to dick. > However, I feel like there must be a better way. > > I have tried HBase, but I had a hard time setting it up and getting it to > work properly. It also felt like a lot of work for my simple requirements. > I > want something /simple/. > > Any suggestions? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Recommended-storage-solution-for-my-setup-5M-items-10KB-pr-tp26150.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >