Re: Recommended storage solution for my setup (~5M items, 10KB pr.)

Ted Yu Thu, 04 Feb 2016 13:15:20 -0800

bq. had a hard time setting it up

Mind sharing your experience in more detail :-)
If you already have a hadoop cluster, it should be relatively straight
forward to setup.


Tuning needs extra effort.

On Thu, Feb 4, 2016 at 12:58 PM, habitats <m...@habitats.no> wrote:

> Hello
>
> I have ~5 million text documents, each around 10-15KB in size, and split
> into ~15 columns. I intend to do machine learning, and thus I need to
> extract all of the data at the same time, and potentially update everything
> on every run.
>
> So far I've just used json serializing, or simply cached the RDD to dick.
> However, I feel like there must be a better way.
>
> I have tried HBase, but I had a hard time setting it up and getting it to
> work properly. It also felt like a lot of work for my simple requirements.
> I
> want something /simple/.
>
> Any suggestions?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Recommended-storage-solution-for-my-setup-5M-items-10KB-pr-tp26150.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Recommended storage solution for my setup (~5M items, 10KB pr.)

Reply via email to