[jira] Commented: (HBASE-1064) HBase REST xml/json improvements

Brian Beggs (JIRA) Mon, 22 Dec 2008 07:39:07 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658571#action_12658571
 ]


Brian Beggs commented on HBASE-1064:
------------------------------------

{quote}
Brian: I don't exactly following the below:

.bq Also the reason for the change in moving to the query string for some of 
these items is that in order to retrieve the row/column/timestamp using the 
path you are unable to have any directives in the path. Unless we wanted to get 
into the thought of reserved words, which IMHO is a bad idea and complicates 
the interface.
{quote}

So with this new implementation of the REST interface it's possible to query a 
table, row, column, or timestamp directly using the path that follows the url.

For example:
http://localhost:60050/testtable1/thesecondrow
Would retrieve the second row from testtable1.

http://localhost:60050/testtable1/thesecondrow/rowWithData:otherData
Would retrieve the column rowWithData:otherData from the secondrow from 
testtable1.

same thing works for timestamps...
http://localhost:60050/testtable1/thesecondrow/rowWithData:otherData/1229121022233
Would retrieve the cell at timestamp 1229121022233, from cell 
rowWithData:otherData, in row thesecondrow, from table testtable1

{quote}
.bq Now I think the real question that needs to be answered... is it necessary 
or desirable to query out the row/column/timestamp data in this RESTful fashion 
using the path?
{quote}

So my question is... Is it desirable to have the interface work in such a way 
that you are able to query out timestamp and individual cell data as in the 
examples above?  If the answer is no I believe it will be relatively easy to 
remove those parts of the interface and make this REST implementation match the 
current REST implementation.  Though the ability to query out cells by 
identifier and cells by timestamp will be lost.  Though I do not believe this 
functionality is available in the current rest implementation.

If the answer is yes, we want to query in the /table/row/column/timestamp 
fashion, this is the reason that the directives (and when I say directive I 
mean things such as fetching region data or using a scanner) were moved into a 
query string.  Now if we wanted to keep this interface and allow for querying 
with the directives in the path I believe that the logic that would be required 
could make the code much more complex than it already is and harder to 
maintain.  And for what it's worth I don't feel it's the most straight forward 
implementation as it currently stands.  

Adding additional complexity to the path I feel would make the harder to 
maintain and add too.  Where as putting these parameters in a query string, I 
feel, simplifies the addition of future code.

To address Tom's questions: 

{quote}
What advantage does this provide besides the perception of being more restful?
{quote}

Again, I'm not sure I have the full answer for this.  I chose this 
implementation for selfish reason outlined below.  And I'm not really sure if 
the ability to query cells by identifier/timestamp is something that is truly 
necessary for HBase.  This is one of the questions I'm hoping someone who has 
been working on the project can answer.

The reason I initially chose to start working on this implementation of the 
REST interface from the patches in issues 814 and 815 was that I felt it would 
be easier to separate the parsing/serialization code out of this version.  I 
also felt that more modification would need to be done to the current interface 
to allow JSON to be sent using it than this implementation would take to send 
xml from it.  

I did not fully understand exactly how items were being retrieved out of the 
interface until I was some way into the project and began to notice the 
differences in the interface.  

{quote}
If the proposed tablename/[row]/[cols]/[timestamp] interface is adopted, how do 
you GET/PUT/POST/DELETE scanners?
{quote}

>From my notes:

creating a scanner
curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X 
POST -T - http://localhost:60050/TEST16?action=newscanner

//TODO fix up the scanner filters.

response:
xml:
<scanner>
  <id>
    2
  </id>
</scanner>

json:
{"id":1}

Using a scanner
curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X 
POST -T - 
"http://localhost:60050/TEST16?action=scan&scannerid=<scannerID>&numrows=<num 
rows to return>"

//TODO scanner action to return all rows between 2 row ID's

Closing a scanner
curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X 
POST -T - 
"http://localhost:60050/TEST16?action=closescanner&scannerid=<scannerId>"

{quote}
In short, a scanner is a stateful resource (like a table) - not an action. The 
proposed model means that a table cannot have any "child resources" - just 
rows. So you could potentially make a scanner a root-level type, and make an 
interface like scanner/[id]/[opts]
So you'd POST scanner/?table=myTable&cols=....
then GET scanner/[id]

because the proposed table interface leaves no room for table/scanner/ - 
scanner would be interpreted as a row ID.

I, for one, thought the old interface worked well because it allowed one to 
access different resource on a given table. Given, 'enable' and 'disable' are 
actions, not resources.
{quote}

I believe these issues are addressed above.  I will say that putting a 
directive as the first item in the path is possible, though it will always need 
to be there. 

{quote}
Think about what other resources might be added to the interface (i.e. maybe 
MapReduce jobs, Pig jobs, etc) - would those be resources of a specific table, 
or root-level types? If you adopt the tablename/rowID/cols interface, it leaves 
no room for child resources other than rows.
{quote}

Perhaps stack or someone can comment on this further, but it seems with the 
paradigm of HBase and how a column store database works I have trouble thinking 
of a case where you were trying to query the datababase and it didn't go from 
/table/row  Though I could see possible changes further down the path from 
there.

Also as far as PIG or MapReduce jobs go.... I believe implementing these 
interfaces will be taken care of by their respective groups.  It's probably 
best to stick with what works for HBase and let the other projects decide 
what's best for them.


> HBase REST xml/json improvements
> --------------------------------
>
>                 Key: HBASE-1064
>                 URL: https://issues.apache.org/jira/browse/HBASE-1064
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: rest
>            Reporter: Brian Beggs
>         Attachments: json2.jar, RESTPatch-pass1.patch
>
>
> I've begun work on creating a REST based interface for HBase that can use 
> both JSON and XML and would be extensible enough to add new formats down the 
> road.  I'm at a point with this where I would like to submit it for review 
> and to get feedback as I continue to work towards new features.
> Attached to this issue you will find the patch for the changes to this point 
> along with a necessary jar file for the JSON serialization.  Also below you 
> will find my notes on how to use what is finished with the interface to this 
> point.
> This patch is based off of jira issues: 
> HBASE-814 and HBASE-815
> I am interested on gaining feedback on:
> -what you guys think works
> -what doesn't work for the project
> -anything that may need to be added
> -code style
> -anything else...
> Finished components:
> -framework around parsing json/xml input
> -framework around serialzing xml/json output
> -changes to exception handing
> -changes to the response object to better handle the serializing of output 
> data
> -table CRUD calls
> -Full table fetching
> -creating/fetching scanners
> TODO:
> -fix up the filtering with scanners
> -row insert/delete operations
> -individual row fetching
> -cell fetching interface
> -scanner use interface
> Here are the wiki(ish) notes for what is done to this point:
> REST Service for HBASE Notes:
> GET / 
> -retrieves a list of all the tables with their meta data in HBase
> curl -v -H "Accept: text/xml" -X GET -T - http://localhost:60050/
> curl -v -H "Accept: application/json" -X GET -T - http://localhost:60050/
> POST / 
> -Create a table
> curl -H "Content-Type: text/xml" -H "Accept: text/xml" -v -X POST -T - 
> http://localhost:60050/newTable
> <table>
>   <name>test14</name>
>   <columnfamilies>
>     <columnfamily>
>       <name>subscription</name>
>       <max-versions>2</max-versions>
>       <compression>NONE</compression>
>       <in-memory>false</in-memory>
>       <block-cache>true</block-cache>
>     </columnfamily>
>   </columnfamilies>
> </table>
> Response:
> <status><code>200</code><message>success</message></status>
> JSON:
> curl -H "Content-Type: application/json" -H "Accept: application/json" -v -X 
> POST -T - http://localhost:60050/newTable
> {"name":"test5", "column_families":[{
>              "name":"columnfam1",
>              "bloomfilter":true,
>              "time_to_live":10,
>              "in_memory":false,
>              "max_versions":2,
>              "compression":"", 
>              "max_value_length":50,
>              "block_cache_enabled":true
>           }
> ]}
> *NOTE* this is an enum defined in class HColumnDescriptor.CompressionType
> GET /[table_name]
> -returns all records for the table
> curl -v -H "Accept: text/xml" -X GET -T - http://localhost:60050/tablename
> curl -v -H "Accept: application/json" -X GET -T - 
> http://localhost:60050/tablename
> GET /[table_name]
> -Parameter Action 
>       metadata - returns the metadata for this table.
>       regions - returns the regions for this table
> curl -v -H "Accept: text/xml" -X GET -T - 
> http://localhost:60050/pricing1?action=metadata
> Update Table
> PUT /[table_name]
> -updates a table 
> curl -v -H "Content-Type: text/xml" -H "Accept: text/xml" -X PUT -T - 
> http://localhost:60050/pricing1
>   <columnfamilies>
>     <columnfamily>
>       <name>subscription</name>
>       <max-versions>3</max-versions>
>       <compression>NONE</compression>
>       <in-memory>false</in-memory>
>       <block-cache>true</block-cache>
>     </columnfamily>
>     <columnfamily>
>       <name>subscription1</name>
>       <max-versions>3</max-versions>
>       <compression>NONE</compression>
>       <in-memory>false</in-memory>
>       <block-cache>true</block-cache>
>     </columnfamily>
>   </columnfamilies>
> curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X 
> PUT -T - http://localhost:60050/pricing1
> {"column_families":[{
>              "name":"columnfam1",
>              "bloomfilter":true,
>              "time_to_live":10,
>              "in_memory":false,
>              "max_versions":2,
>              "compression":"", 
>              "max_value_length":50,
>              "block_cache_enabled":true
>           }, 
>           {
>              "name":"columnfam2",
>              "bloomfilter":true,
>              "time_to_live":10,
>              "in_memory":false,
>              "max_versions":2,
>              "compression":"", 
>              "max_value_length":50,
>              "block_cache_enabled":true
>           }
> ]}
> Delete Table
> curl -v -H "Content-Type: text/xml" -H "Accept: text/xml" -X DELETE -T - 
> http://localhost:60050/TEST16
> creating a scanner
> curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X 
> POST -T - http://localhost:60050/TEST16?action=newscanner
> //TODO fix up the scanner filters.
> response:
> xml:
> <scanner>
>   <id>
>     2
>   </id>
> </scanner>
> json:
> {"id":1}
> Using a scanner
> curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X 
> POST -T - 
> "http://localhost:60050/TEST16?action=scan&scannerId=<scannerID>&numrows=<num 
> rows to return>"
> This would be my first submission to an open source project of this size, so 
> please, give it to me rough.  =)
> Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1064) HBase REST xml/json improvements

Reply via email to