Re: Going it alone.

2020-04-16 Thread u...@moosheimer.com
A good idea in principle. But there are also reasons why it is not a good idea. Some companies forbid that their company name is on a mailing list in an OpenSource project. Or that their name is related to OpenSource. Even though they use OpenSource. This would exclude those persons, which in

Re: Spark submit OutOfMemory Error in local mode

2017-08-22 Thread u...@moosheimer.com
Since you didn't post any concrete information it's hard to give you an advice. Try to increase the executor memory (spark.executor.memory). If that doesn't help give all the experts in the community a chance to help you by adding more details like version, logfile, source etc Mit

Re: Has anyone used CoreNLP from stanford for sentiment analysis in Spark? It does not work as desired for me.

2017-04-28 Thread u...@moosheimer.com
A really good one is vaderSentiment (https://github.com/cjhutto/vaderSentiment). regards Kay-Uwe Moosheimer Am 28.04.2017 um 12:24 schrieb Alonso Isidoro Roman: > I forked some time ago a twitter analyzer, but i think the best is to > provide the original link >

Spark 2.0 Scala 2.11 and Kafka 0.10 Scala 2.10

2017-02-08 Thread u...@moosheimer.com
Dear devs, is it possible to use Spark 2.0.2 Scala 2.11 and consume messages from kafka server 0.10.0.2 running on Scala 2.10? I tried this the last two days by using createDirectStream and can't get no message out of kafka?! I'm using HDP 2.5.3 running kafka_2.10-0.10.0.2.5.3.0-37 and Spark

Re: Does Spark SQL support indexes?

2016-08-15 Thread u...@moosheimer.com
So you mean HBase, Cassandra, Hana, Elasticsearch and so on do not use idexes? There might be some very interesting new concepts I've missed? Could you be more precise? ;-) Regards, Uwe Am 15.08.2016 um 11:59 schrieb Gourav Sengupta: > The world has moved in from indexes, materialized views,

Re: ORC v/s Parquet for Spark 2.0

2016-07-27 Thread u...@moosheimer.com
Hi Gourav, Kudu (if you mean Apache Kuda, the Cloudera originated project) is a in memory db with data storage while Parquet is "only" a columnar storage format. As I understand, Kudu is a BI db to compete with Exasol or Hana (ok ... that's more a wish :-). Regards, Uwe Mit freundlichen

Re: Anyone has used Apache nifi

2016-06-16 Thread u...@moosheimer.com
Hi Mich, we use NiFi and it's really great. My company made a architecture blueprint based on NiFi and Spark. https://www.mysecondway.com/en/BOSON-Architecture Mit freundlichen Grüßen / best regards Kay-Uwe Moosheimer > Am 16.06.2016 um 11:10 schrieb Mich Talebzadeh

Re: Appropriate Apache Users List Uses

2016-02-09 Thread u...@moosheimer.com
I wouldn't expect this either. Very disappointing... -Kay-Uwe Moosheimer > Am 09.02.2016 um 20:53 schrieb Ryan Victory : > > Yeah, a little disappointed with this, I wouldn't expect to be sent > unsolicited mail based on my membership to this list. > > -Ryan Victory > >>

Re: Higher Processing times in Spark Streaming with kafka Direct

2015-12-04 Thread u...@moosheimer.com
Hi, processing time depends on what you are doing with the events. Increasing the number of partitions could be an idea if you write more messages to the topic than you read currently via Spark. Can you write more details? Mit freundlichen Grüßen / best regards Kay-Uwe Moosheimer > Am