[
https://issues.apache.org/jira/browse/HBASE-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658571#action_12658571
]
Brian Beggs commented on HBASE-1064:
------------------------------------
{quote}
Brian: I don't exactly following the below:
.bq Also the reason for the change in moving to the query string for some of
these items is that in order to retrieve the row/column/timestamp using the
path you are unable to have any directives in the path. Unless we wanted to get
into the thought of reserved words, which IMHO is a bad idea and complicates
the interface.
{quote}
So with this new implementation of the REST interface it's possible to query a
table, row, column, or timestamp directly using the path that follows the url.
For example:
http://localhost:60050/testtable1/thesecondrow
Would retrieve the second row from testtable1.
http://localhost:60050/testtable1/thesecondrow/rowWithData:otherData
Would retrieve the column rowWithData:otherData from the secondrow from
testtable1.
same thing works for timestamps...
http://localhost:60050/testtable1/thesecondrow/rowWithData:otherData/1229121022233
Would retrieve the cell at timestamp 1229121022233, from cell
rowWithData:otherData, in row thesecondrow, from table testtable1
{quote}
.bq Now I think the real question that needs to be answered... is it necessary
or desirable to query out the row/column/timestamp data in this RESTful fashion
using the path?
{quote}
So my question is... Is it desirable to have the interface work in such a way
that you are able to query out timestamp and individual cell data as in the
examples above? If the answer is no I believe it will be relatively easy to
remove those parts of the interface and make this REST implementation match the
current REST implementation. Though the ability to query out cells by
identifier and cells by timestamp will be lost. Though I do not believe this
functionality is available in the current rest implementation.
If the answer is yes, we want to query in the /table/row/column/timestamp
fashion, this is the reason that the directives (and when I say directive I
mean things such as fetching region data or using a scanner) were moved into a
query string. Now if we wanted to keep this interface and allow for querying
with the directives in the path I believe that the logic that would be required
could make the code much more complex than it already is and harder to
maintain. And for what it's worth I don't feel it's the most straight forward
implementation as it currently stands.
Adding additional complexity to the path I feel would make the harder to
maintain and add too. Where as putting these parameters in a query string, I
feel, simplifies the addition of future code.
To address Tom's questions:
{quote}
What advantage does this provide besides the perception of being more restful?
{quote}
Again, I'm not sure I have the full answer for this. I chose this
implementation for selfish reason outlined below. And I'm not really sure if
the ability to query cells by identifier/timestamp is something that is truly
necessary for HBase. This is one of the questions I'm hoping someone who has
been working on the project can answer.
The reason I initially chose to start working on this implementation of the
REST interface from the patches in issues 814 and 815 was that I felt it would
be easier to separate the parsing/serialization code out of this version. I
also felt that more modification would need to be done to the current interface
to allow JSON to be sent using it than this implementation would take to send
xml from it.
I did not fully understand exactly how items were being retrieved out of the
interface until I was some way into the project and began to notice the
differences in the interface.
{quote}
If the proposed tablename/[row]/[cols]/[timestamp] interface is adopted, how do
you GET/PUT/POST/DELETE scanners?
{quote}
>From my notes:
creating a scanner
curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X
POST -T - http://localhost:60050/TEST16?action=newscanner
//TODO fix up the scanner filters.
response:
xml:
<scanner>
<id>
2
</id>
</scanner>
json:
{"id":1}
Using a scanner
curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X
POST -T -
"http://localhost:60050/TEST16?action=scan&scannerid=<scannerID>&numrows=<num
rows to return>"
//TODO scanner action to return all rows between 2 row ID's
Closing a scanner
curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X
POST -T -
"http://localhost:60050/TEST16?action=closescanner&scannerid=<scannerId>"
{quote}
In short, a scanner is a stateful resource (like a table) - not an action. The
proposed model means that a table cannot have any "child resources" - just
rows. So you could potentially make a scanner a root-level type, and make an
interface like scanner/[id]/[opts]
So you'd POST scanner/?table=myTable&cols=....
then GET scanner/[id]
because the proposed table interface leaves no room for table/scanner/ -
scanner would be interpreted as a row ID.
I, for one, thought the old interface worked well because it allowed one to
access different resource on a given table. Given, 'enable' and 'disable' are
actions, not resources.
{quote}
I believe these issues are addressed above. I will say that putting a
directive as the first item in the path is possible, though it will always need
to be there.
{quote}
Think about what other resources might be added to the interface (i.e. maybe
MapReduce jobs, Pig jobs, etc) - would those be resources of a specific table,
or root-level types? If you adopt the tablename/rowID/cols interface, it leaves
no room for child resources other than rows.
{quote}
Perhaps stack or someone can comment on this further, but it seems with the
paradigm of HBase and how a column store database works I have trouble thinking
of a case where you were trying to query the datababase and it didn't go from
/table/row Though I could see possible changes further down the path from
there.
Also as far as PIG or MapReduce jobs go.... I believe implementing these
interfaces will be taken care of by their respective groups. It's probably
best to stick with what works for HBase and let the other projects decide
what's best for them.
> HBase REST xml/json improvements
> --------------------------------
>
> Key: HBASE-1064
> URL: https://issues.apache.org/jira/browse/HBASE-1064
> Project: Hadoop HBase
> Issue Type: Improvement
> Components: rest
> Reporter: Brian Beggs
> Attachments: json2.jar, RESTPatch-pass1.patch
>
>
> I've begun work on creating a REST based interface for HBase that can use
> both JSON and XML and would be extensible enough to add new formats down the
> road. I'm at a point with this where I would like to submit it for review
> and to get feedback as I continue to work towards new features.
> Attached to this issue you will find the patch for the changes to this point
> along with a necessary jar file for the JSON serialization. Also below you
> will find my notes on how to use what is finished with the interface to this
> point.
> This patch is based off of jira issues:
> HBASE-814 and HBASE-815
> I am interested on gaining feedback on:
> -what you guys think works
> -what doesn't work for the project
> -anything that may need to be added
> -code style
> -anything else...
> Finished components:
> -framework around parsing json/xml input
> -framework around serialzing xml/json output
> -changes to exception handing
> -changes to the response object to better handle the serializing of output
> data
> -table CRUD calls
> -Full table fetching
> -creating/fetching scanners
> TODO:
> -fix up the filtering with scanners
> -row insert/delete operations
> -individual row fetching
> -cell fetching interface
> -scanner use interface
> Here are the wiki(ish) notes for what is done to this point:
> REST Service for HBASE Notes:
> GET /
> -retrieves a list of all the tables with their meta data in HBase
> curl -v -H "Accept: text/xml" -X GET -T - http://localhost:60050/
> curl -v -H "Accept: application/json" -X GET -T - http://localhost:60050/
> POST /
> -Create a table
> curl -H "Content-Type: text/xml" -H "Accept: text/xml" -v -X POST -T -
> http://localhost:60050/newTable
> <table>
> <name>test14</name>
> <columnfamilies>
> <columnfamily>
> <name>subscription</name>
> <max-versions>2</max-versions>
> <compression>NONE</compression>
> <in-memory>false</in-memory>
> <block-cache>true</block-cache>
> </columnfamily>
> </columnfamilies>
> </table>
> Response:
> <status><code>200</code><message>success</message></status>
> JSON:
> curl -H "Content-Type: application/json" -H "Accept: application/json" -v -X
> POST -T - http://localhost:60050/newTable
> {"name":"test5", "column_families":[{
> "name":"columnfam1",
> "bloomfilter":true,
> "time_to_live":10,
> "in_memory":false,
> "max_versions":2,
> "compression":"",
> "max_value_length":50,
> "block_cache_enabled":true
> }
> ]}
> *NOTE* this is an enum defined in class HColumnDescriptor.CompressionType
> GET /[table_name]
> -returns all records for the table
> curl -v -H "Accept: text/xml" -X GET -T - http://localhost:60050/tablename
> curl -v -H "Accept: application/json" -X GET -T -
> http://localhost:60050/tablename
> GET /[table_name]
> -Parameter Action
> metadata - returns the metadata for this table.
> regions - returns the regions for this table
> curl -v -H "Accept: text/xml" -X GET -T -
> http://localhost:60050/pricing1?action=metadata
> Update Table
> PUT /[table_name]
> -updates a table
> curl -v -H "Content-Type: text/xml" -H "Accept: text/xml" -X PUT -T -
> http://localhost:60050/pricing1
> <columnfamilies>
> <columnfamily>
> <name>subscription</name>
> <max-versions>3</max-versions>
> <compression>NONE</compression>
> <in-memory>false</in-memory>
> <block-cache>true</block-cache>
> </columnfamily>
> <columnfamily>
> <name>subscription1</name>
> <max-versions>3</max-versions>
> <compression>NONE</compression>
> <in-memory>false</in-memory>
> <block-cache>true</block-cache>
> </columnfamily>
> </columnfamilies>
> curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X
> PUT -T - http://localhost:60050/pricing1
> {"column_families":[{
> "name":"columnfam1",
> "bloomfilter":true,
> "time_to_live":10,
> "in_memory":false,
> "max_versions":2,
> "compression":"",
> "max_value_length":50,
> "block_cache_enabled":true
> },
> {
> "name":"columnfam2",
> "bloomfilter":true,
> "time_to_live":10,
> "in_memory":false,
> "max_versions":2,
> "compression":"",
> "max_value_length":50,
> "block_cache_enabled":true
> }
> ]}
> Delete Table
> curl -v -H "Content-Type: text/xml" -H "Accept: text/xml" -X DELETE -T -
> http://localhost:60050/TEST16
> creating a scanner
> curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X
> POST -T - http://localhost:60050/TEST16?action=newscanner
> //TODO fix up the scanner filters.
> response:
> xml:
> <scanner>
> <id>
> 2
> </id>
> </scanner>
> json:
> {"id":1}
> Using a scanner
> curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X
> POST -T -
> "http://localhost:60050/TEST16?action=scan&scannerId=<scannerID>&numrows=<num
> rows to return>"
> This would be my first submission to an open source project of this size, so
> please, give it to me rough. =)
> Thanks.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.