Re: Doubts related to Apache Blur

Naresh Yadav Thu, 21 Nov 2013 21:21:53 -0800

Thanks much Garrett, Gagan for your help, i got direction to proceed, now i
will try all this and post here if i face any problem.


Naresh

On Fri, Nov 22, 2013 at 9:10 AM, Gagan Juneja <[email protected]>wrote:

> Naresh,
> As far as Windows os is concerned may be you can installl cygwin and
> try running blur shell. I haven't tried but this should work.
>
> Regards,
> Gagan
>
> On Fri, Nov 22, 2013 at 12:25 AM, Garrett Barton
> <[email protected]> wrote:
> > Naresh,
> >
> > I understand your problem set better now.
> >
> > As far as a data structure I would define the following fields within a
> > family called data:
> >
> > data.measure (Type String, not fieldLessIndexed )
> > data.period (Type String, not fieldLessIndexed )
> > data.pool1..n (Type String, not fieldLessIndexed )
> > data.tags (Type String)
> > data.cost (Type Long,not fieldLessIndexed )
> >
> > If you were in the Blur shell you would do something like:
> > create -t myTable -c 4
> > definecolumn myTable data measure String
> > definecolumn myTable data period String
> > definecolumn myTable data tags String
> > definecolumn myTable data cost Long
> >
> > This will let you create as many pool columns as you want, and when you
> > retrieve the row you will get the titles back by virtue of the column
> > names. When you query against a tag, you would query against tags field
> > where you have also loaded your pool data into.
> > So rewriting your queries into a working Blur query (assuming your family
> > is called 'data', i dont know exactly what your working on so I'm sure
> you
> > could come up with a better name) would look like:
> >
> >    Query 1 : I need get all rows with
> >             data.measure:Cost AND data.period:Nov13 AND data.tags:Tag1
> >             O/P = Row1, Row3
> >    Query 2: get all rows with
> >             data.measure:Cost AND data.period:Dec13 AND data.tags:Tag1
> AND
> > data.tags:TagA       O/P = Row4, Row5
> >
> >
> > As far as getting to work in windows I wouldn't wait for that to happen
> too
> > soon.  If you download any favorite linux distro live install, install
> > virtualbox, and download the latest release of Blur you could be running
> in
> > under an hour (depending on bandwidth).
> >
> > I will reply to the JIRA ticket about Hadoop 2.x with my mods soon,
> > hopefully with a patch to make things work.
> >
> > Take it easy,
> > ~Garrett
> >
> >
> >
> > On Thu, Nov 21, 2013 at 12:48 PM, Naresh Yadav <[email protected]>
> wrote:
> >
> >> Hi
> >> ,
> >> Thanks much Garrett for guiding me, that was really helpful..
> >>
> >> For Doubt *1* i will definitely need your help once i start trying
> >> installation..Please share document on this if possible.
> >>
> >> For Doubt *2* i think will be able to manage with VM, will explore
> that, it
> >> would have been better for me if somebody already installed on windows
> by
> >> making bat files so that i can also reuse that.
> >>
> >> For Doubt *3* my actual case is like this (assume these as rows in excel
> >> sheet that is how my data will be) :
> >>
> >> Row1 : Measure=Cost, Period=Nov13, Pool1=Tag1, Pool2=TagA,  Cost=50
> >> Row2 : Measure=Cost, Period=Nov13, Pool1=Tag2, Pool2=TagB , Cost=20
> >> Row3 : Measure=Cost, Period=Nov13, Pool1=Tag1, Cost=20
> >> Row4 : Measure=Cost, Period=Dec13, Pool1=Tag1, Pool2=TagA, Pool3=TagP,
> >> Cost=150
> >> Row5 : Measure=Cost, Period=Dec13, Pool1=Tag1, Pool2=TagA, Pool4=TagQ,
> >> Cost=170
> >> Row6 : Measure=Cost, Period=Dec13, Pool5=Tag1, Cost=120
> >>
> >>    Query 1 : I need get all rows with
> >>             Measure:Cost, Period:Nov13, Tag1                 O/P = Row1,
> >> Row3
> >>    Query 2: get all rows with
> >>             Measure:Cost, Period:Dec13, Tag1, TagA       O/P = Row4,
> Row5
> >> So challenge for me is Tag parts as there are varying with rows and also
> >> while querying on them i will not have
> >> knowledge of their column/pool names just N tags i can have in any
> row...
> >>
> >> Will such querying will be supported OR Suggest better data model  of
> >> storage of this case.
> >>
> >> Naresh
> >>
> >> On Thu, Nov 21, 2013 at 8:42 PM, Garrett Barton <
> [email protected]
> >> >wrote:
> >>
> >> > Welcome aboard!
> >> >
> >> > I can answer a few:
> >> >
> >> > 1. Yes with some build flags and script tweaking I can help with. I am
> >> > running it now.
> >> >
> >> > 2. You will have to make startup scripts for windows, and honestly I
> >> could
> >> > not tell you if Blur would even run in a windows environment.  Have
> you
> >> > considered doing dev in a VM? Or running a VM on your windows machine
> at
> >> > least for hosting the hadoop stack?
> >> >
> >> > 3. Are you familiar with lucene itself?  You must query against a
> column
> >> > (ok not 100% true with blur but it seems like you have specified
> field1=x
> >> > field2=y requirements) I am slightly confused with your queries as
> they
> >> > have a mix of column names and values that are in different columns in
> >> your
> >> > example.
> >> > Assuming your first query is cost:50 AND period:Nov13 AND pool1:Tag1
> then
> >> > sure. If you meant any kind of cost, then you simple omit that from
> the
> >> > query in the first place.
> >> > Assuming your second query is (cost:50 OR cost:150) AND period:Dec13
> AND
> >> > pool1:Tag1 AND pool2:Tag2 then sure that works too.
> >> >
> >> > For the most part, if you can write a pretty standard SQL statement to
> >> > query for your data as if it was in a database, that can be duplicated
> >> > inside Blur.
> >> >
> >> >
> >> > Millions of rows will be fine.  A single table with the column names
> you
> >> > have described is fine, you will have to come up with some kind of
> unique
> >> > identifier for each row to load into Blur. (Like a primary key in a
> >> > database)
> >> >
> >> > Let me know if you have any more questions. :)
> >> >
> >> > ~Garrett
> >> >
> >> >
> >> > On Thu, Nov 21, 2013 at 5:38 AM, Naresh Yadav <[email protected]>
> >> > wrote:
> >> >
> >> > > hi,
> >> > >
> >> > > I am just reading about Apache Blur from last one day..and i found
> it
> >> > > perfect fit for my project. But i have some doubts :
> >> > >
> >> > > 1. Will i be able to Hadoop 2.0 existing cluster with Apache Blur
> >> latest
> >> > > version
> >> > >
> >> > > 2. My development enviornment is Windows and Hadoop 2.0 supports
> >> windows
> >> > > so   i have doubt will apache blur latest version will work on
> windows
> >> > > smoothly..i will get startup scripts for windows.
> >> > >
> >> > > 3. Here is 4 rows of my data which i need to store in one table :
> >> > >        Cost=50, Period=Nov13, Pool1=Tag1, Pool2=Tag2
> >> > >        Cost=50, Period=Nov13, Pool1=Tag1, Pool2=Tag3
> >> > >        Cost=150, Period=Dec13, Pool1=Tag1, Pool2=Tag2, Pool3=Tag3
> >> > >        Cost=150, Period=Dec13, Pool1=Tag1, Pool2=Tag2, Pool3=Tag4
> >> > >
> >> > >    Query 1 : I need get all rows with
> >> > >              Cost, Nov13, Tag1
> >> > >    Query 2: get all rows with Cost, Dec13, Tag1, Tag2
> >> > >      Will i be able to do perform such query if yes how should i
> design
> >> > > this Blur table for this use case. Note : In this table there can be
> >> > > million of rows with all historic data.
> >> > >
> >> > > Please help me, i am new to big data technologies..Your guidance
> will
> >> > give
> >> > > me direction to proceed..
> >> > >
> >> > > Thanks
> >> > > Naresh
> >> > >
> >> >
> >>
>

Re: Doubts related to Apache Blur

Reply via email to