Fwd: Indexing Big Data With or Without Solr

2014-05-06 Thread Furkan KAMACI
Your previous mail did not sent to mail list, I am forwarding.

-- Forwarded message --
From: Vineet Mishra 
Date: 2014-05-06 14:33 GMT+03:00
Subject: Re: Indexing Big Data With or Without Solr
To: Furkan KAMACI 


Hi Furkan,

No not the metadata but I am planning to store sensor data to it fyi,
http://www.freescale.com/webapp/sps/site/overview.jsp?code=SD_DATAFILEFORMAT
this
is how sensor data will look like, moreover you can think of # of Columns
to be extended to 200 and # of rows to be around 1 Lakh, like this I will
be having around 1 Lakhs different Files.

Hardware spec. 6 Xeon Processor Machine 2.13 GHz, 16GB, HDD - Can extend,
No issue.

Thanks!



On Tue, May 6, 2014 at 4:31 PM, Furkan KAMACI wrote:

> Hi Vineet;
>
> I remove such kind of HTML tags and stop words (high frequency terms are
> removed). However sensor data and web data has different characteristics,
> you are right. Could you tell me what kind of information do you store at
> each data (geolocation, name, description etc. etc.)? On the other hand
> could you tell me more about your hardware infrastructure?
>
> Thanks;
> Furkan KAMACI
>
>
> 2014-05-06 13:40 GMT+03:00 Vineet Mishra :
>
> Hi Furkan,
>>
>> Indexing the document and indexing the raw digital sensor data is
>> completely different, for your case a web document will have more of
>> repeated tokens, like where you are having web pages then its quite obvious
>> to have more of a repeating words like *div, span, title, style, etc. *This
>> will be ideal case for Solr, as it gives the benefit of Inverted Index for
>> your case but if you closely go through my requirement I have sensor data,
>> moreover the data will be huge in size and hardly there are chances of
>> repetition.
>>
>> What do you say, how will it be suitable?
>>
>> [Open Question - Expert advise needed]
>>
>> Thanks!
>>
>>
>>
>> On Tue, Apr 29, 2014 at 1:41 PM, Furkan KAMACI wrote:
>>
>>> Hi Vineet;
>>>
>>> Many millions of documents (web pages) that has an average response time
>>> less than 10 ms.
>>>
>>> Thanks;
>>> Furkan KAMACI
>>>
>>>
>>> 2014-04-29 10:55 GMT+03:00 Vineet Mishra :
>>>
>>> Hi Furkan,
>>>>
>>>> Can you specify what type and size of data are you having?
>>>> Moreover what is your index size and query response time.
>>>>
>>>> Thanks
>>>> Vineet
>>>>
>>>> -- Forwarded message --
>>>> From: Furkan KAMACI 
>>>> Date: Tue, Apr 15, 2014 at 7:53 PM
>>>> Subject: Re: Indexing Big Data With or Without Solr
>>>> To: "solr-user@lucene.apache.org" 
>>>>
>>>>
>>>> Hi Vineet;
>>>>
>>>> I've been using SolrCloud for such kind of Big Data and I think that you
>>>> should consider to use it. If you have any problems you can ask it here.
>>>>
>>>> Thanks;
>>>> Furkan KAMACI
>>>>
>>>>
>>>> 2014-04-15 13:20 GMT+03:00 Vineet Mishra :
>>>>
>>>> > Hi All,
>>>> >
>>>> > I have worked with Solr 3.5 to implement real time search on some
>>>> 100GB
>>>> > data, that worked fine but was little slow on complex queries(Multiple
>>>> > group/joined queries).
>>>> > But now I want to index some real Big Data(around 4 TB or even more),
>>>> can
>>>> > SolrCloud be solution for it if not what could be the best possible
>>>> > solution in this case.
>>>> >
>>>> > *Stats for the previous Implementation:*
>>>> > It was Master Slave Architecture with normal Standalone multiple
>>>> instance
>>>> > of Solr 3.5. There were around 12 Solr instance running on different
>>>> > machines.
>>>> >
>>>> > *Things to consider for the next implementation:*
>>>> > Since all the data is sensor data hence it is the factor of duplicity
>>>> and
>>>> > uniqueness.
>>>> >
>>>> > *Really urgent, please take the call on priority with set of feasible
>>>> > solution.*
>>>> >
>>>> > Regards
>>>> >
>>>>
>>>>
>>>
>>
>


Re: Indexing Big Data With or Without Solr

2014-04-29 Thread rulinma
mark. 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-Big-Data-With-or-Without-Solr-tp4131215p4133831.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing Big Data With or Without Solr

2014-04-26 Thread Aman Tandon
Thanks vineet

With Regards
Aman Tandon


On Wed, Apr 23, 2014 at 7:21 PM, Vineet Mishra wrote:

> I did it with Tomcat and Zookeeper Ensemble, will mail you the steps
> shortly.
>
> Cheers
>
>
> On Sat, Apr 19, 2014 at 9:09 AM, Aman Tandon  >wrote:
>
> > Vineet please share after you setup for solr cloud
> > Are you using jetty or tomcat.?
> >
> > On Saturday, April 19, 2014, Vineet Mishra 
> wrote:
> > > Thanks Furkan, I will definitely give it a try then.
> > >
> > > Thanks again!
> > >
> > >
> > >
> > >
> > > On Tue, Apr 15, 2014 at 7:53 PM, Furkan KAMACI  > >wrote:
> > >
> > >> Hi Vineet;
> > >>
> > >> I've been using SolrCloud for such kind of Big Data and I think that
> you
> > >> should consider to use it. If you have any problems you can ask it
> here.
> > >>
> > >> Thanks;
> > >> Furkan KAMACI
> > >>
> > >>
> > >> 2014-04-15 13:20 GMT+03:00 Vineet Mishra :
> > >>
> > >> > Hi All,
> > >> >
> > >> > I have worked with Solr 3.5 to implement real time search on some
> > 100GB
> > >> > data, that worked fine but was little slow on complex
> queries(Multiple
> > >> > group/joined queries).
> > >> > But now I want to index some real Big Data(around 4 TB or even
> more),
> > can
> > >> > SolrCloud be solution for it if not what could be the best possible
> > >> > solution in this case.
> > >> >
> > >> > *Stats for the previous Implementation:*
> > >> > It was Master Slave Architecture with normal Standalone multiple
> > instance
> > >> > of Solr 3.5. There were around 12 Solr instance running on different
> > >> > machines.
> > >> >
> > >> > *Things to consider for the next implementation:*
> > >> > Since all the data is sensor data hence it is the factor of
> duplicity
> > and
> > >> > uniqueness.
> > >> >
> > >> > *Really urgent, please take the call on priority with set of
> feasible
> > >> > solution.*
> > >> >
> > >> > Regards
> > >> >
> > >>
> > >
> >
> > --
> > Sent from Gmail Mobile
> >
>


Re: Indexing Big Data With or Without Solr

2014-04-23 Thread Vineet Mishra
I did it with Tomcat and Zookeeper Ensemble, will mail you the steps
shortly.

Cheers


On Sat, Apr 19, 2014 at 9:09 AM, Aman Tandon wrote:

> Vineet please share after you setup for solr cloud
> Are you using jetty or tomcat.?
>
> On Saturday, April 19, 2014, Vineet Mishra  wrote:
> > Thanks Furkan, I will definitely give it a try then.
> >
> > Thanks again!
> >
> >
> >
> >
> > On Tue, Apr 15, 2014 at 7:53 PM, Furkan KAMACI  >wrote:
> >
> >> Hi Vineet;
> >>
> >> I've been using SolrCloud for such kind of Big Data and I think that you
> >> should consider to use it. If you have any problems you can ask it here.
> >>
> >> Thanks;
> >> Furkan KAMACI
> >>
> >>
> >> 2014-04-15 13:20 GMT+03:00 Vineet Mishra :
> >>
> >> > Hi All,
> >> >
> >> > I have worked with Solr 3.5 to implement real time search on some
> 100GB
> >> > data, that worked fine but was little slow on complex queries(Multiple
> >> > group/joined queries).
> >> > But now I want to index some real Big Data(around 4 TB or even more),
> can
> >> > SolrCloud be solution for it if not what could be the best possible
> >> > solution in this case.
> >> >
> >> > *Stats for the previous Implementation:*
> >> > It was Master Slave Architecture with normal Standalone multiple
> instance
> >> > of Solr 3.5. There were around 12 Solr instance running on different
> >> > machines.
> >> >
> >> > *Things to consider for the next implementation:*
> >> > Since all the data is sensor data hence it is the factor of duplicity
> and
> >> > uniqueness.
> >> >
> >> > *Really urgent, please take the call on priority with set of feasible
> >> > solution.*
> >> >
> >> > Regards
> >> >
> >>
> >
>
> --
> Sent from Gmail Mobile
>


Re: Indexing Big Data With or Without Solr

2014-04-18 Thread Aman Tandon
Vineet please share after you setup for solr cloud
Are you using jetty or tomcat.?

On Saturday, April 19, 2014, Vineet Mishra  wrote:
> Thanks Furkan, I will definitely give it a try then.
>
> Thanks again!
>
>
>
>
> On Tue, Apr 15, 2014 at 7:53 PM, Furkan KAMACI wrote:
>
>> Hi Vineet;
>>
>> I've been using SolrCloud for such kind of Big Data and I think that you
>> should consider to use it. If you have any problems you can ask it here.
>>
>> Thanks;
>> Furkan KAMACI
>>
>>
>> 2014-04-15 13:20 GMT+03:00 Vineet Mishra :
>>
>> > Hi All,
>> >
>> > I have worked with Solr 3.5 to implement real time search on some 100GB
>> > data, that worked fine but was little slow on complex queries(Multiple
>> > group/joined queries).
>> > But now I want to index some real Big Data(around 4 TB or even more),
can
>> > SolrCloud be solution for it if not what could be the best possible
>> > solution in this case.
>> >
>> > *Stats for the previous Implementation:*
>> > It was Master Slave Architecture with normal Standalone multiple
instance
>> > of Solr 3.5. There were around 12 Solr instance running on different
>> > machines.
>> >
>> > *Things to consider for the next implementation:*
>> > Since all the data is sensor data hence it is the factor of duplicity
and
>> > uniqueness.
>> >
>> > *Really urgent, please take the call on priority with set of feasible
>> > solution.*
>> >
>> > Regards
>> >
>>
>

-- 
Sent from Gmail Mobile


Re: Indexing Big Data With or Without Solr

2014-04-18 Thread Vineet Mishra
Thanks Furkan, I will definitely give it a try then.

Thanks again!




On Tue, Apr 15, 2014 at 7:53 PM, Furkan KAMACI wrote:

> Hi Vineet;
>
> I've been using SolrCloud for such kind of Big Data and I think that you
> should consider to use it. If you have any problems you can ask it here.
>
> Thanks;
> Furkan KAMACI
>
>
> 2014-04-15 13:20 GMT+03:00 Vineet Mishra :
>
> > Hi All,
> >
> > I have worked with Solr 3.5 to implement real time search on some 100GB
> > data, that worked fine but was little slow on complex queries(Multiple
> > group/joined queries).
> > But now I want to index some real Big Data(around 4 TB or even more), can
> > SolrCloud be solution for it if not what could be the best possible
> > solution in this case.
> >
> > *Stats for the previous Implementation:*
> > It was Master Slave Architecture with normal Standalone multiple instance
> > of Solr 3.5. There were around 12 Solr instance running on different
> > machines.
> >
> > *Things to consider for the next implementation:*
> > Since all the data is sensor data hence it is the factor of duplicity and
> > uniqueness.
> >
> > *Really urgent, please take the call on priority with set of feasible
> > solution.*
> >
> > Regards
> >
>


Re: Indexing Big Data With or Without Solr

2014-04-15 Thread Furkan KAMACI
Hi Vineet;

I've been using SolrCloud for such kind of Big Data and I think that you
should consider to use it. If you have any problems you can ask it here.

Thanks;
Furkan KAMACI


2014-04-15 13:20 GMT+03:00 Vineet Mishra :

> Hi All,
>
> I have worked with Solr 3.5 to implement real time search on some 100GB
> data, that worked fine but was little slow on complex queries(Multiple
> group/joined queries).
> But now I want to index some real Big Data(around 4 TB or even more), can
> SolrCloud be solution for it if not what could be the best possible
> solution in this case.
>
> *Stats for the previous Implementation:*
> It was Master Slave Architecture with normal Standalone multiple instance
> of Solr 3.5. There were around 12 Solr instance running on different
> machines.
>
> *Things to consider for the next implementation:*
> Since all the data is sensor data hence it is the factor of duplicity and
> uniqueness.
>
> *Really urgent, please take the call on priority with set of feasible
> solution.*
>
> Regards
>


Indexing Big Data With or Without Solr

2014-04-15 Thread Vineet Mishra
Hi All,

I have worked with Solr 3.5 to implement real time search on some 100GB
data, that worked fine but was little slow on complex queries(Multiple
group/joined queries).
But now I want to index some real Big Data(around 4 TB or even more), can
SolrCloud be solution for it if not what could be the best possible
solution in this case.

*Stats for the previous Implementation:*
It was Master Slave Architecture with normal Standalone multiple instance
of Solr 3.5. There were around 12 Solr instance running on different
machines.

*Things to consider for the next implementation:*
Since all the data is sensor data hence it is the factor of duplicity and
uniqueness.

*Really urgent, please take the call on priority with set of feasible
solution.*

Regards