(Am I doing this mailinglist thing right? Never used this ...)

I do not have a cluster.

Initially I tried to setup hadoop+hbase+spark, but after spending a week trying to get work, I gave up. I had a million problems with mismatching versions, and things working locally on the server, but not programatically through my client computer, and vice versa. There was /always something /that did not work, one way another.

And since I had to actually get things /done /rather than becoming an expert in clustering, I gave up and just used simple serializing.

Now I'm going to make a second attempt, but this time around I'll ask for help:p

--
mvh
Patrick Skjennum


On 04.02.2016 22.14, Ted Yu wrote:
bq. had a hard time setting it up

Mind sharing your experience in more detail :-)
If you already have a hadoop cluster, it should be relatively straight forward to setup.

Tuning needs extra effort.

On Thu, Feb 4, 2016 at 12:58 PM, habitats <m...@habitats.no <mailto:m...@habitats.no>> wrote:

    Hello

    I have ~5 million text documents, each around 10-15KB in size, and
    split
    into ~15 columns. I intend to do machine learning, and thus I need to
    extract all of the data at the same time, and potentially update
    everything
    on every run.

    So far I've just used json serializing, or simply cached the RDD
    to dick.
    However, I feel like there must be a better way.

    I have tried HBase, but I had a hard time setting it up and
    getting it to
    work properly. It also felt like a lot of work for my simple
    requirements. I
    want something /simple/.

    Any suggestions?



    --
    View this message in context:
    
http://apache-spark-user-list.1001560.n3.nabble.com/Recommended-storage-solution-for-my-setup-5M-items-10KB-pr-tp26150.html
    Sent from the Apache Spark User List mailing list archive at
    Nabble.com.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
    <mailto:user-unsubscr...@spark.apache.org>
    For additional commands, e-mail: user-h...@spark.apache.org
    <mailto:user-h...@spark.apache.org>



Reply via email to