Re: Doubts related to Apache Blur

Gagan Juneja Thu, 21 Nov 2013 19:41:29 -0800

Naresh,
As far as Windows os is concerned may be you can installl cygwin and
try running blur shell. I haven't tried but this should work.


Regards,
Gagan

On Fri, Nov 22, 2013 at 12:25 AM, Garrett Barton
<[email protected]> wrote:
> Naresh,
>
> I understand your problem set better now.
>
> As far as a data structure I would define the following fields within a
> family called data:
>
> data.measure (Type String, not fieldLessIndexed )
> data.period (Type String, not fieldLessIndexed )
> data.pool1..n (Type String, not fieldLessIndexed )
> data.tags (Type String)
> data.cost (Type Long,not fieldLessIndexed )
>
> If you were in the Blur shell you would do something like:
> create -t myTable -c 4
> definecolumn myTable data measure String
> definecolumn myTable data period String
> definecolumn myTable data tags String
> definecolumn myTable data cost Long
>
> This will let you create as many pool columns as you want, and when you
> retrieve the row you will get the titles back by virtue of the column
> names. When you query against a tag, you would query against tags field
> where you have also loaded your pool data into.
> So rewriting your queries into a working Blur query (assuming your family
> is called 'data', i dont know exactly what your working on so I'm sure you
> could come up with a better name) would look like:
>
>    Query 1 : I need get all rows with
>             data.measure:Cost AND data.period:Nov13 AND data.tags:Tag1
>             O/P = Row1, Row3
>    Query 2: get all rows with
>             data.measure:Cost AND data.period:Dec13 AND data.tags:Tag1 AND
> data.tags:TagA       O/P = Row4, Row5
>
>
> As far as getting to work in windows I wouldn't wait for that to happen too
> soon.  If you download any favorite linux distro live install, install
> virtualbox, and download the latest release of Blur you could be running in
> under an hour (depending on bandwidth).
>
> I will reply to the JIRA ticket about Hadoop 2.x with my mods soon,
> hopefully with a patch to make things work.
>
> Take it easy,
> ~Garrett
>
>
>
> On Thu, Nov 21, 2013 at 12:48 PM, Naresh Yadav <[email protected]> wrote:
>
>> Hi
>> ,
>> Thanks much Garrett for guiding me, that was really helpful..
>>
>> For Doubt *1* i will definitely need your help once i start trying
>> installation..Please share document on this if possible.
>>
>> For Doubt *2* i think will be able to manage with VM, will explore that, it
>> would have been better for me if somebody already installed on windows by
>> making bat files so that i can also reuse that.
>>
>> For Doubt *3* my actual case is like this (assume these as rows in excel
>> sheet that is how my data will be) :
>>
>> Row1 : Measure=Cost, Period=Nov13, Pool1=Tag1, Pool2=TagA,  Cost=50
>> Row2 : Measure=Cost, Period=Nov13, Pool1=Tag2, Pool2=TagB , Cost=20
>> Row3 : Measure=Cost, Period=Nov13, Pool1=Tag1, Cost=20
>> Row4 : Measure=Cost, Period=Dec13, Pool1=Tag1, Pool2=TagA, Pool3=TagP,
>> Cost=150
>> Row5 : Measure=Cost, Period=Dec13, Pool1=Tag1, Pool2=TagA, Pool4=TagQ,
>> Cost=170
>> Row6 : Measure=Cost, Period=Dec13, Pool5=Tag1, Cost=120
>>
>>    Query 1 : I need get all rows with
>>             Measure:Cost, Period:Nov13, Tag1                 O/P = Row1,
>> Row3
>>    Query 2: get all rows with
>>             Measure:Cost, Period:Dec13, Tag1, TagA       O/P = Row4, Row5
>> So challenge for me is Tag parts as there are varying with rows and also
>> while querying on them i will not have
>> knowledge of their column/pool names just N tags i can have in any row...
>>
>> Will such querying will be supported OR Suggest better data model  of
>> storage of this case.
>>
>> Naresh
>>
>> On Thu, Nov 21, 2013 at 8:42 PM, Garrett Barton <[email protected]
>> >wrote:
>>
>> > Welcome aboard!
>> >
>> > I can answer a few:
>> >
>> > 1. Yes with some build flags and script tweaking I can help with. I am
>> > running it now.
>> >
>> > 2. You will have to make startup scripts for windows, and honestly I
>> could
>> > not tell you if Blur would even run in a windows environment.  Have you
>> > considered doing dev in a VM? Or running a VM on your windows machine at
>> > least for hosting the hadoop stack?
>> >
>> > 3. Are you familiar with lucene itself?  You must query against a column
>> > (ok not 100% true with blur but it seems like you have specified field1=x
>> > field2=y requirements) I am slightly confused with your queries as they
>> > have a mix of column names and values that are in different columns in
>> your
>> > example.
>> > Assuming your first query is cost:50 AND period:Nov13 AND pool1:Tag1 then
>> > sure. If you meant any kind of cost, then you simple omit that from the
>> > query in the first place.
>> > Assuming your second query is (cost:50 OR cost:150) AND period:Dec13 AND
>> > pool1:Tag1 AND pool2:Tag2 then sure that works too.
>> >
>> > For the most part, if you can write a pretty standard SQL statement to
>> > query for your data as if it was in a database, that can be duplicated
>> > inside Blur.
>> >
>> >
>> > Millions of rows will be fine.  A single table with the column names you
>> > have described is fine, you will have to come up with some kind of unique
>> > identifier for each row to load into Blur. (Like a primary key in a
>> > database)
>> >
>> > Let me know if you have any more questions. :)
>> >
>> > ~Garrett
>> >
>> >
>> > On Thu, Nov 21, 2013 at 5:38 AM, Naresh Yadav <[email protected]>
>> > wrote:
>> >
>> > > hi,
>> > >
>> > > I am just reading about Apache Blur from last one day..and i found it
>> > > perfect fit for my project. But i have some doubts :
>> > >
>> > > 1. Will i be able to Hadoop 2.0 existing cluster with Apache Blur
>> latest
>> > > version
>> > >
>> > > 2. My development enviornment is Windows and Hadoop 2.0 supports
>> windows
>> > > so   i have doubt will apache blur latest version will work on windows
>> > > smoothly..i will get startup scripts for windows.
>> > >
>> > > 3. Here is 4 rows of my data which i need to store in one table :
>> > >        Cost=50, Period=Nov13, Pool1=Tag1, Pool2=Tag2
>> > >        Cost=50, Period=Nov13, Pool1=Tag1, Pool2=Tag3
>> > >        Cost=150, Period=Dec13, Pool1=Tag1, Pool2=Tag2, Pool3=Tag3
>> > >        Cost=150, Period=Dec13, Pool1=Tag1, Pool2=Tag2, Pool3=Tag4
>> > >
>> > >    Query 1 : I need get all rows with
>> > >              Cost, Nov13, Tag1
>> > >    Query 2: get all rows with Cost, Dec13, Tag1, Tag2
>> > >      Will i be able to do perform such query if yes how should i design
>> > > this Blur table for this use case. Note : In this table there can be
>> > > million of rows with all historic data.
>> > >
>> > > Please help me, i am new to big data technologies..Your guidance will
>> > give
>> > > me direction to proceed..
>> > >
>> > > Thanks
>> > > Naresh
>> > >
>> >
>>

Re: Doubts related to Apache Blur

Reply via email to