What does Blockchain technology mean for Big Data? And how Hadoop/Spark will play role with it?

2017-12-18 Thread Gaurav1809
Hi All, Will Bigdata tools & technology work with Blockchain in future? Any possible use cases that anyone is likely to face, please share. Thanks Gaurav -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ -

[Spark-Submit] Where to store data files while running job in cluster mode?

2017-09-29 Thread Gaurav1809
Hi All, I have multi node architecture of (1 master,2 workers) Spark cluster, the job runs to read CSV file data and it works fine when run on local mode (Local(*)). However, when the same job is ran in cluster mode(Spark://HOST:PORT), it is not able to read it. I want to know how to reference

[Spark-Submit] Where to store data files while running job in cluster mode?

2017-09-29 Thread Gaurav1809
Hi All, I have multi node architecture of (1 master,2 workers) Spark cluster, the job runs to read CSV file data and it works fine when run on local mode (Local(*)). However, when the same job is ran in cluster mode (Spark://HOST:PORT), it is not able to read it. I want to know how to reference

Where can I get few GBs of sample data?

2017-09-28 Thread Gaurav1809
Hi All, I have setup multi node spark cluster and now looking for good volume of data to test and see how it works while processing the same. Can anyone provide pointers as to where can i get few GBs of free sample data? Thanks and regards, Gaurav -- Sent from:

Cloudera - How to switch to the newly added Spark service (Spark2) from Spark 1.6 in CDH 5.12

2017-09-19 Thread Gaurav1809
Hello all, I downloaded CDH and it comes with Spark 1.6 As per the step by step guide given - I added Spark 2 in the services list. Now I can see both Spark 1.6 & Spark 2 And when I do Spark-Shell in terminal window, it starts with Spark 1.6 only. How to switch to Spark 2? What all _HOMEs or

How can I Upgrade Spark 1.6 to 2.x in Cloudera QuickStart VM 5.7

2017-09-12 Thread Gaurav1809
Hi All, I am using Cloudera 5.7 QuickStart VM for learning purpose. It has Spark 1.6 I want to upgrade Spark to 2.x. How can I do it? I dong think it will be as easy as downloading 2.x and replace the older files. Please guide if anyone has done this in past. Steps would be highly helpful.

Re: Do we anything for Deep Learning in Spark?

2017-07-05 Thread Gaurav1809
Thanks Roope for the inputs. On Wed, Jul 5, 2017 at 11:41 PM, Roope [via Apache Spark User List] < ml+s1001560n2882...@n3.nabble.com> wrote: > Microsoft Machine Learning Library for Apache Spark lets you run CNTK deep > learning models on Spark. > > https://github.com/Azure/mmlspark > > The

Do we anything for Deep Learning in Spark?

2017-06-19 Thread Gaurav1809
Hi All, Similar to how we have machine learning library called ML, do we have anything for deep learning? If yes, please share the details. If not then what should be the approach? Thanks and regards, Gaurav Pandya -- View this message in context:

IOT in Spark

2017-05-18 Thread Gaurav1809
Hello gurus, How exactly it works in real world scenarios when it come to read data from IOT devices (say for example censors at in/out gate in huge mall)? Can we do it in Spark? Do we need to use any other tool/utility (kafka???) to read data from those censors and then process them in Spark?

Has anyone used CoreNLP from stanford for sentiment analysis in Spark? It does not work as desired for me.

2017-04-28 Thread Gaurav1809
Has anyone used CoreNLP from stanford for sentiment analysis in Spark? It is not working as desired or may be I need to do some work which I am not aware of. Following is the example. 1). I look forward to interacting with kids of states governed by the congress. - POSITIVE 2). I look forward to

Shall I use Apache Zeppelin for data analytics & visualization?

2017-04-16 Thread Gaurav1809
Hi All, I am looking for a data visualization (and analytics) tool. My processing is done through Spark. There are many tools available around us. I got some suggestions on Apache Zeppelin too? Can anybody throw some light on its power and capabilities when it comes to data analytics and

Re: Spark Streaming. Real-time save data and visualize on dashboard

2017-04-12 Thread Gaurav1809
May be you can injest your data in ELK and use Kibana for live reporting. Of course there can be better way of doing this. Waiting for others to share their opinion. Thanks. -- View this message in context:

Any NLP library for sentiment analysis in Spark?

2017-04-11 Thread Gaurav1809
Hi All, I need to determine sentiment for given document (statement, paragraph etc.) Is there any NLP library available with Apache Spark that I can use here? Any other pointers towards this would be highly appreciated. Thanks in advance. Gaurav Pandya -- View this message in context:

With Twitter4j API, why am I not able to pull tweets with certain keywords?

2017-04-04 Thread Gaurav1809
I am using Spark Streaming to with twitter4j API to pull tweets. I am able to pull tweets for some keywords but not for others. If I explicitly tweet with those keywords, even then API does not pull them. For some it is smooth. Has anyone encountered this issue before? Please suggest solution.

How best we can store streaming data on dashboards for real time user experience?

2017-03-29 Thread Gaurav1809
I am getting streaming data and want to show them onto dashboards in real time? May I know how best we can handle these streaming data? where to store? (DB or HDFS or ???) I want to give users a real time analytics experience. Please suggest possible ways. Thanks. -- View this message in

Utilities for Twitter Analysis?

2017-03-28 Thread Gaurav1809
Hello all, I want to know the utilities (and complete pipeline) that I can use for twitter Analysis in Spark? Also I want to know if Kafka is needed Or Spark streaming will be able to do work? Thanks and regards, Gaurav Pandya -- View this message in context:

How to load "kafka" as a data source

2017-03-23 Thread Gaurav1809
Hi All, I am running a simple command on spark-shell - like this. It's a piece of structured streaming. val lines = (spark .readStream .format("kafka") .option("kafka.bootstrap.servers", "localhost:9092") .option("subscribe", "test") .load() .selectExpr("CAST(value AS STRING)")

Structured Streaming - Can I start using it?

2017-03-13 Thread Gaurav1809
I read in spark documentation that Structured Streaming is still ALPHA in Spark 2.1 and the APIs are still experimental. Shall I use it to re write my existing spark streaming code? Looks like it is not yet production ready. What happens if Structured Streaming project gets withdrawn? -- View

Which streaming platform is best? Kafka or Spark Streaming?

2017-03-09 Thread Gaurav1809
Hi All, Would you please let me know which streaming platform is best. Be it server log processing, social media feeds ot any such streaming data. I want to know the comparison between Kafka & Spark Streaming. -- View this message in context:

Server Log Processing - Regex or ElasticSearch?

2017-03-03 Thread Gaurav1809
Hello All, One small question if you can help me out. I am working on Server log processing in Spark for my organization. I am using regular expressions (Regex) for pattern matching and then do further analysis on the identifies pieces. Ip, username, date etc. Is this good approach? Shall I go

How to do dashboard reporting in spark

2017-01-19 Thread Gaurav1809
Hi All, Once data is stored in data frames? What's next? where do we go from there? Do we store data in Hive or any RDBMS(Oracle, MYSql, Teradata)? How to do the dashboard reporting based on the data present in dataframes. If there is any BI tool available in Spark Ecosystem, Please suggest.