thanks I will have a look.

Mich

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 7 June 2016 at 13:38, Jörn Franke <jornfra...@gmail.com> wrote:

> Solr is basically an in-memory text index with a lot of capabilities for
> language analysis extraction (you can compare  it to a Google for your
> tweets). The system itself has a lot of features and has a complexity
> similar to Big data systems. This index files can be backed by HDFS. You
> can put the tweets directly into solr without going via HDFS files.
>
> Carefully decide what fields to index / you want to search. It does not
> make sense to index everything.
>
> On 07 Jun 2016, at 13:51, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
> Ok So basically for predictive off-line (as opposed to streaming) in a
> nutshell one can use Apache Flume to store twitter data in hdfs and use
> Solr to query the data?
>
> This is what it says:
>
> Solr is a standalone enterprise search server with a REST-like API. You
> put documents in it (called "indexing") via JSON, XML, CSV or binary over
> HTTP. You query it via HTTP GET and receive JSON, XML, CSV or binary
> results.
>
> thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 7 June 2016 at 12:39, Jörn Franke <jornfra...@gmail.com> wrote:
>
>> Well I have seen that The algorithms mentioned are used for this. However
>> some preprocessing through solr makes sense - it takes care of synonyms,
>> homonyms, stemming etc
>>
>> On 07 Jun 2016, at 13:33, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>> Thanks Jorn,
>>
>> To start I would like to explore how can one turn some of the data into
>> useful information.
>>
>> I would like to look at certain trend analysis. Simple correlation shows
>> that the more there is a mention of a typical topic say for example
>> "organic food" the more people are inclined to go for it. To see one can
>> deduce that orgaind food is a potential growth area.
>>
>> Now I have all infra-structure to ingest that data. Like using flume to
>> store it or Spark streaming to do near real time work.
>>
>> Now I want to slice and dice that data for say organic food.
>>
>> I presume this is a typical question.
>>
>> You mentioned Spark ml (machine learning?) . Is that something viable?
>>
>> Cheers
>>
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 7 June 2016 at 12:22, Jörn Franke <jornfra...@gmail.com> wrote:
>>
>>> Spark ml Support Vector machines or neural networks could be candidates.
>>> For unstructured learning it could be clustering.
>>> For doing a graph analysis On the followers you can easily use Spark
>>> Graphx
>>> Keep in mind that each tweet contains a lot of meta data (location,
>>> followers etc) that is more or less structured.
>>> For unstructured text analytics (eg tweet itself)I recommend
>>> solr/ElasticSearch .
>>>
>>> However I am not sure what you want to do with the data exactly.
>>>
>>>
>>> On 07 Jun 2016, at 13:16, Mich Talebzadeh <mich.talebza...@gmail.com>
>>> wrote:
>>>
>>> Hi,
>>>
>>> This is really a general question.
>>>
>>> I use Spark to get twitter data. I did some looking at it
>>>
>>>     val ssc = new StreamingContext(sparkConf, Seconds(2))
>>>     val tweets = TwitterUtils.createStream(ssc, None)
>>>     val statuses = tweets.map(status => status.getText())
>>>     statuses.print()
>>>
>>> Ok
>>>
>>> Also I can use Apache flume to store data in hdfs directory
>>>
>>> $FLUME_HOME/bin/flume-ng agent --conf ./conf/ -f conf/twitter.conf
>>> Dflume.root.logger=DEBUG,console -n TwitterAgent
>>> Now that stores twitter data in binary format in  hdfs directory.
>>>
>>> My question is pretty basic.
>>>
>>> What is the best tool/language to dif in to that data. For example
>>> twitter streaming data. I am getting all sorts od stuff coming in. Say I am
>>> only interested in certain topics like sport etc. How can I detect the
>>> signal from the noise using what tool and language?
>>>
>>> Thanks
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>>
>>
>

Reply via email to