Thanks much Garrett, Gagan for your help, i got direction to proceed, now i will try all this and post here if i face any problem.
Naresh On Fri, Nov 22, 2013 at 9:10 AM, Gagan Juneja <[email protected]>wrote: > Naresh, > As far as Windows os is concerned may be you can installl cygwin and > try running blur shell. I haven't tried but this should work. > > Regards, > Gagan > > On Fri, Nov 22, 2013 at 12:25 AM, Garrett Barton > <[email protected]> wrote: > > Naresh, > > > > I understand your problem set better now. > > > > As far as a data structure I would define the following fields within a > > family called data: > > > > data.measure (Type String, not fieldLessIndexed ) > > data.period (Type String, not fieldLessIndexed ) > > data.pool1..n (Type String, not fieldLessIndexed ) > > data.tags (Type String) > > data.cost (Type Long,not fieldLessIndexed ) > > > > If you were in the Blur shell you would do something like: > > create -t myTable -c 4 > > definecolumn myTable data measure String > > definecolumn myTable data period String > > definecolumn myTable data tags String > > definecolumn myTable data cost Long > > > > This will let you create as many pool columns as you want, and when you > > retrieve the row you will get the titles back by virtue of the column > > names. When you query against a tag, you would query against tags field > > where you have also loaded your pool data into. > > So rewriting your queries into a working Blur query (assuming your family > > is called 'data', i dont know exactly what your working on so I'm sure > you > > could come up with a better name) would look like: > > > > Query 1 : I need get all rows with > > data.measure:Cost AND data.period:Nov13 AND data.tags:Tag1 > > O/P = Row1, Row3 > > Query 2: get all rows with > > data.measure:Cost AND data.period:Dec13 AND data.tags:Tag1 > AND > > data.tags:TagA O/P = Row4, Row5 > > > > > > As far as getting to work in windows I wouldn't wait for that to happen > too > > soon. If you download any favorite linux distro live install, install > > virtualbox, and download the latest release of Blur you could be running > in > > under an hour (depending on bandwidth). > > > > I will reply to the JIRA ticket about Hadoop 2.x with my mods soon, > > hopefully with a patch to make things work. > > > > Take it easy, > > ~Garrett > > > > > > > > On Thu, Nov 21, 2013 at 12:48 PM, Naresh Yadav <[email protected]> > wrote: > > > >> Hi > >> , > >> Thanks much Garrett for guiding me, that was really helpful.. > >> > >> For Doubt *1* i will definitely need your help once i start trying > >> installation..Please share document on this if possible. > >> > >> For Doubt *2* i think will be able to manage with VM, will explore > that, it > >> would have been better for me if somebody already installed on windows > by > >> making bat files so that i can also reuse that. > >> > >> For Doubt *3* my actual case is like this (assume these as rows in excel > >> sheet that is how my data will be) : > >> > >> Row1 : Measure=Cost, Period=Nov13, Pool1=Tag1, Pool2=TagA, Cost=50 > >> Row2 : Measure=Cost, Period=Nov13, Pool1=Tag2, Pool2=TagB , Cost=20 > >> Row3 : Measure=Cost, Period=Nov13, Pool1=Tag1, Cost=20 > >> Row4 : Measure=Cost, Period=Dec13, Pool1=Tag1, Pool2=TagA, Pool3=TagP, > >> Cost=150 > >> Row5 : Measure=Cost, Period=Dec13, Pool1=Tag1, Pool2=TagA, Pool4=TagQ, > >> Cost=170 > >> Row6 : Measure=Cost, Period=Dec13, Pool5=Tag1, Cost=120 > >> > >> Query 1 : I need get all rows with > >> Measure:Cost, Period:Nov13, Tag1 O/P = Row1, > >> Row3 > >> Query 2: get all rows with > >> Measure:Cost, Period:Dec13, Tag1, TagA O/P = Row4, > Row5 > >> So challenge for me is Tag parts as there are varying with rows and also > >> while querying on them i will not have > >> knowledge of their column/pool names just N tags i can have in any > row... > >> > >> Will such querying will be supported OR Suggest better data model of > >> storage of this case. > >> > >> Naresh > >> > >> On Thu, Nov 21, 2013 at 8:42 PM, Garrett Barton < > [email protected] > >> >wrote: > >> > >> > Welcome aboard! > >> > > >> > I can answer a few: > >> > > >> > 1. Yes with some build flags and script tweaking I can help with. I am > >> > running it now. > >> > > >> > 2. You will have to make startup scripts for windows, and honestly I > >> could > >> > not tell you if Blur would even run in a windows environment. Have > you > >> > considered doing dev in a VM? Or running a VM on your windows machine > at > >> > least for hosting the hadoop stack? > >> > > >> > 3. Are you familiar with lucene itself? You must query against a > column > >> > (ok not 100% true with blur but it seems like you have specified > field1=x > >> > field2=y requirements) I am slightly confused with your queries as > they > >> > have a mix of column names and values that are in different columns in > >> your > >> > example. > >> > Assuming your first query is cost:50 AND period:Nov13 AND pool1:Tag1 > then > >> > sure. If you meant any kind of cost, then you simple omit that from > the > >> > query in the first place. > >> > Assuming your second query is (cost:50 OR cost:150) AND period:Dec13 > AND > >> > pool1:Tag1 AND pool2:Tag2 then sure that works too. > >> > > >> > For the most part, if you can write a pretty standard SQL statement to > >> > query for your data as if it was in a database, that can be duplicated > >> > inside Blur. > >> > > >> > > >> > Millions of rows will be fine. A single table with the column names > you > >> > have described is fine, you will have to come up with some kind of > unique > >> > identifier for each row to load into Blur. (Like a primary key in a > >> > database) > >> > > >> > Let me know if you have any more questions. :) > >> > > >> > ~Garrett > >> > > >> > > >> > On Thu, Nov 21, 2013 at 5:38 AM, Naresh Yadav <[email protected]> > >> > wrote: > >> > > >> > > hi, > >> > > > >> > > I am just reading about Apache Blur from last one day..and i found > it > >> > > perfect fit for my project. But i have some doubts : > >> > > > >> > > 1. Will i be able to Hadoop 2.0 existing cluster with Apache Blur > >> latest > >> > > version > >> > > > >> > > 2. My development enviornment is Windows and Hadoop 2.0 supports > >> windows > >> > > so i have doubt will apache blur latest version will work on > windows > >> > > smoothly..i will get startup scripts for windows. > >> > > > >> > > 3. Here is 4 rows of my data which i need to store in one table : > >> > > Cost=50, Period=Nov13, Pool1=Tag1, Pool2=Tag2 > >> > > Cost=50, Period=Nov13, Pool1=Tag1, Pool2=Tag3 > >> > > Cost=150, Period=Dec13, Pool1=Tag1, Pool2=Tag2, Pool3=Tag3 > >> > > Cost=150, Period=Dec13, Pool1=Tag1, Pool2=Tag2, Pool3=Tag4 > >> > > > >> > > Query 1 : I need get all rows with > >> > > Cost, Nov13, Tag1 > >> > > Query 2: get all rows with Cost, Dec13, Tag1, Tag2 > >> > > Will i be able to do perform such query if yes how should i > design > >> > > this Blur table for this use case. Note : In this table there can be > >> > > million of rows with all historic data. > >> > > > >> > > Please help me, i am new to big data technologies..Your guidance > will > >> > give > >> > > me direction to proceed.. > >> > > > >> > > Thanks > >> > > Naresh > >> > > > >> > > >> >
