Hi Charles,

A thought on debugging deserialization is to not do it in a query. Capture the 
JSON returned from a rest call. Write a simple unit test that deserializes that 
by itself from a string or file. Deserialization is a bit of a black art, and 
is really a problem separate from Drill itself.

As it turns out, for my "day job" I'm doing a POC using Drill to query 
SumoLogic. I took this as an opportunity to fill that gap you mentioned in our 
book: how to create a storage plugin. See [1]. This is a work in progress, but 
it has helped me build the planner-side stuff up to the batch reader, after 
which the work is identical to that for a format plugin.

The Sumo API is REST-based, but for now I'm using the clunky REST client 
available in the Sumo public repo because of some unfortunate details of the 
Sumo REST service when used for this purpose. (Sumo returns data as a set of 
key/value pairs, not as a fixed JSON schema. [4])

Poking around elsewhere, it turns out someone wrote a very simple Presto 
connector for REST [2] using the Retrofit library from Square [3] which seems 
very simple to use. If we create a generic REST plugin, we might want to look 
at how it was done in Presto. Presto requires an up-front schema which Retrofit 
can provide. Drill, of course, does not require such a schema and so works with 
ad-hoc schemas, such as the one that Sumo's API provides. 

Actually, better than using a deserializer would be to use Drill's existing 
JSON parser to read data directly into value vectors. But, that existing code 
has lots of tech debt. I've been working on a PR for new version based on EVF, 
but that is a while off, and won't help us today.

It is interesting to note that neither the JSON reader, nor a generic REST API 
would work with the Sumo API because of is structure. I think the JSON reader 
would read an entire batch of Sumo results as a single record composed of a 
repeated Map, with elements being the key/value pairs. Not at all ideal.

So, both the JSON reader, and the REST API, should eventually handle data 
formats which are generic (name/value pairs) rather than expressed in the 
structure of JSON objects (as required by Jackson and Retrofit.) That is a 
topic for later, but is why the Sumo plugin has to be custom to Sumo's API for 
now.


Thanks,
- Paul


[1] https://github.com/paul-rogers/drill/wiki/Create-a-Storage-Plugin

[2] https://github.com/prestosql-rocks/presto-rest

[3] https://square.github.io/retrofit/

[4] https://help.sumologic.com/APIs/Search-Job-API/About-the-Search-Job-API



 

    On Friday, November 15, 2019, 09:04:21 AM PST, Charles Givre 
<cgi...@gmail.com> wrote:  
 
 Hi Igor, 
Thanks for the advice.  I've been doing some digging and am still pretty stuck 
here.  Can you recommend any techniques about how to debug the Jackson 
serialization/deserialization?  I added a unit test that serializes a query and 
then deserializes it and that test fails.  I've tracked this back to a 
constructor not receiving the plugin config and then throwing a NPE. What I 
can't seem to figure out is where that is being called from and why.

Any advice would be greatly appreciated.  Code can be found here: 
https://github.com/apache/drill/pull/1892 
<https://github.com/apache/drill/pull/1892>
Thanks,
-- C


> On Oct 12, 2019, at 3:27 AM, Igor Guzenko <ihor.huzenko....@gmail.com> wrote:
> 
> Hello Charles,
> 
> Looks like you found another new issue. Maybe I explained unclear, but my
> previous suggestion wasn't about EXPLAIN PLAN construct, but rather:
> 1)  Use http client like Postman or simply browser to save response of
> requested rest service into json file
> 2)  Try to debug reading the file by Drill in order to compare how
> Calcite's conversion from AST SqlNode to RelNode tree differs for existing
> dfs storage plugin from same flow in your storage plugin.
> 
> From your last email I can figure out that exists another issue with class
> HttpGroupScan, at some point Drill tried to deserialize json into instance
> of HttpGroupScan and jackson library didn't find how to do this. Probably
> you missed some constructor with jackson metadata, for example see in
> HiveScan operator:
> 
> @JsonCreator
> public HiveScan(@JsonProperty("userName") final String userName,
>                @JsonProperty("hiveReadEntry") final HiveReadEntry
> hiveReadEntry,
>                @JsonProperty("hiveStoragePluginConfig") final
> HiveStoragePluginConfig hiveStoragePluginConfig,
>                @JsonProperty("columns") final List<SchemaPath> columns,
>                @JsonProperty("confProperties") final Map<String,
> String> confProperties,
>                @JacksonInject final StoragePluginRegistry
> pluginRegistry) throws ExecutionSetupException {
>  this(userName,
>      hiveReadEntry,
>      (HiveStoragePlugin) pluginRegistry.getPlugin(hiveStoragePluginConfig),
>      columns,
>      null, confProperties);
> }
> 
> Kind regards,
> Igor
> 
> 
> 
> On Fri, Oct 11, 2019 at 10:53 PM Charles Givre <cgi...@gmail.com 
> <mailto:cgi...@gmail.com>> wrote:
> 
>> Hi Igor,
>> Thanks for responding.  I'm not sure if this is what you intended, but
>> looked at the JSON for the Query plans and found something interesting.
>> For the SELECT * query, I'm getting the following when I try to run the
>> physical plan that it generates (without modification).  Do you think this
>> could be a related problem?
>> 
>> 
>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
>> InvalidDefinitionException: Cannot construct instance of
>> `org.apache.drill.exec.store.http.HttpGroupScan` (no Creators, like default
>> construct, exist): cannot deserialize from Object value (no delegate- or
>> property-based Creator)
>> at [Source: (String)"{
>>  "head" : {
>>    "version" : 1,
>>    "generator" : {
>>      "type" : "ExplainHandler",
>>      "info" : ""
>>    },
>>    "type" : "APACHE_DRILL_PHYSICAL",
>>    "options" : [ ],
>>    "queue" : 0,
>>    "hasResourcePlan" : false,
>>    "resultMode" : "EXEC"
>>  },
>>  "graph" : [ {
>>    "pop" : "http-scan",
>>    "@id" : 2,
>>    "scanSpec" : {
>>      "uri" : "/json?lat=36.7201600&lng=-4.4203400&date=2019-10-02"
>>    },
>>    "columns" : [ "`**`" ],
>>    "storageConfig" : {
>>      "type" : "http",
>>  "[truncated 766 chars]; line: 16, column: 5] (through reference chain:
>> org.apache.drill.exec.physical.PhysicalPlan["graph"]->java.util.ArrayList[0])
>> 
>> 
>> Please, refer to logs for more information.
>> 
>> [Error Id: 751b6d05-a631-4eca-9d83-162ab4fa839f on localhost:31010]
>> 
>> 
>>> On Oct 11, 2019, at 12:25 PM, Igor Guzenko <ihor.huzenko....@gmail.com>
>> wrote:
>>> 
>>> Hello Charles,
>>> 
>>> You got the error from Apache Calcite at the planning stage while
>>> converting SQLIdentifier to RexNode. From your stack trace the conversion
>>> starts here DefaultSqlHandler.convertToRel(DefaultSqlHandler.java:685)
>> and
>>> goes to
>> SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:3694). I
>>> would suggest to save json returned by rest locally as file and debug
>> same
>>> trace for query on the json file. So then you can find difference between
>>> conversion of sql identifier to rel for standart json reading and for
>> your
>>> storage plugin.
>>> 
>>> Thanks, Igor
>>> 
>>> 
>>> On Fri, Oct 11, 2019 at 6:34 PM Charles Givre <cgi...@gmail.com 
>>> <mailto:cgi...@gmail.com> <mailto:
>> cgi...@gmail.com>> wrote:
>>> 
>>>> Hello all,
>>>> I decided to take the leap and attempt to implement a storage plugin.  I
>>>> found that a few people had started this, so I thought I'd complete a
>>>> simple generic HTTP/REST storage plugin. The use case would be to enrich
>>>> data sets with data that's available via public or internal APIs.
>>>> 
>>>> Anyway, I'm a little stuck and need some assistance.  i got the plugin
>> to
>>>> successfully execute a star query and return the results correctly:
>>>> 
>>>> apache drill> SELECT * FROM
>>>> http.`/json?lat=36.7201600&lng=-4.4203400&date=2019-10-02`;
>>>> 
>>>> 
>> +------------+------------+-------------+------------+----------------------+--------------------+-------------------------+-----------------------+-----------------------------+---------------------------+
>>>> |  sunrise  |  sunset  | solar_noon  | day_length |
>>>> civil_twilight_begin | civil_twilight_end | nautical_twilight_begin |
>>>> nautical_twilight_end | astronomical_twilight_begin |
>>>> astronomical_twilight_end |
>>>> 
>>>> 
>> +------------+------------+-------------+------------+----------------------+--------------------+-------------------------+-----------------------+-----------------------------+---------------------------+
>>>> | 6:13:58 AM | 5:59:55 PM | 12:06:56 PM | 11:45:57  | 5:48:14 AM
>>>> | 6:25:38 PM        | 5:18:16 AM              | 6:55:36 PM            |
>>>> 4:48:07 AM                  | 7:25:45 PM                |
>>>> 
>>>> 
>> +------------+------------+-------------+------------+----------------------+--------------------+-------------------------+-----------------------+-----------------------------+---------------------------+
>>>> 1 row selected (0.392 seconds)
>>>> 
>>>> However, when I attempt to select individual fields i get errors.  (see
>>>> below for full stack trace).  I've walked through this with the
>> debugger,
>>>> but it seems like the code is breaking before it hits my storage plugin
>> and
>>>> I'm not sure what to do about it.  Here's a link to the code:
>>>> https://github.com/cgivre/drill/tree/storage-http/contrib/storage-http 
>>>> <https://github.com/cgivre/drill/tree/storage-http/contrib/storage-http>
>> <
>>>> https://github.com/cgivre/drill/tree/storage-http/contrib/storage-http
>> <https://github.com/cgivre/drill/tree/storage-http/contrib/storage-http>>
>>>> 
>>>> Any assistance would be greatly appreciated.  Thanks!!
>>>> 
>>>> 
>>>> 
>>>> apache drill> !verbose
>>>> verbose: on
>>>> apache drill> SELECT sunset FROM
>>>> http.`/json?lat=36.7201600&lng=-4.4203400&date=2019-10-02`;
>>>> Error: SYSTEM ERROR: AssertionError: Field ordinal 1 is invalid for
>> type
>>>> '(DrillRecordRow[**])'
>>>> 
>>>> 
>>>> Please, refer to logs for more information.
>>>> 
>>>> [Error Id: d7bccd2f-73e6-40d7-9b8a-73a772f65c02 on 192.168.1.21:31010]
>>>> (state=,code=0)
>>>> java.sql.SQLException: SYSTEM ERROR: AssertionError: Field ordinal 1 is
>>>> invalid for  type '(DrillRecordRow[**])'
>>>> 
>>>> 
>>>> Please, refer to logs for more information.
>>>> 
>>>> [Error Id: d7bccd2f-73e6-40d7-9b8a-73a772f65c02 on 192.168.1.21:31010]
>>>>      at
>>>> 
>> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:538)
>>>>      at
>>>> 
>> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:610)
>>>>      at
>>>> 
>> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1278)
>>>>      at
>>>> 
>> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:58)
>>>>      at
>>>> 
>> org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:667)
>>>>      at
>>>> 
>> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1102)
>>>>      at
>>>> 
>> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1113)
>>>>      at
>>>> 
>> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:675)
>>>>      at
>>>> 
>> org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:200)
>>>>      at
>>>> 
>> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
>>>>      at
>>>> 
>> org.apache.calcite.avatica.AvaticaStatement.execute(AvaticaStatement.java:217)
>>>>      at sqlline.Commands.executeSingleQuery(Commands.java:1008)
>>>>      at sqlline.Commands.execute(Commands.java:957)
>>>>      at sqlline.Commands.sql(Commands.java:921)
>>>>      at sqlline.SqlLine.dispatch(SqlLine.java:717)
>>>>      at sqlline.SqlLine.begin(SqlLine.java:536)
>>>>      at sqlline.SqlLine.start(SqlLine.java:266)
>>>>      at sqlline.SqlLine.main(SqlLine.java:205)
>>>> Caused by: org.apache.drill.common.exceptions.UserRemoteException:
>> SYSTEM
>>>> ERROR: AssertionError: Field ordinal 1 is invalid for  type
>>>> '(DrillRecordRow[**])'
>>>> 
>>>> 
>>>> Please, refer to logs for more information.
>>>> 
>>>> [Error Id: d7bccd2f-73e6-40d7-9b8a-73a772f65c02 on 192.168.1.21:31010]
>>>>      at
>>>> 
>> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
>>>>      at
>>>> org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422)
>>>>      at
>>>> org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96)
>>>>      at
>>>> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:273)
>>>>      at
>>>> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:243)
>>>>      at
>>>> 
>> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>>>>      at
>>>> 
>> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>>>>      at
>>>> 
>> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>>>>      at
>>>> 
>> io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
>>>>      at
>>>> 
>> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>>>>      at
>>>> 
>> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>>>>      at
>>>> 
>> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>>>>      at
>>>> 
>> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
>>>>      at
>>>> 
>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>>>>      at
>>>> 
>> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>>>>      at
>>>> 
>> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>>>>      at
>>>> 
>> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>>>>      at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>>>>      at
>>>> 
>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
>>>>      at java.lang.Thread.run(Thread.java:748)
>>>> Caused by: org.apache.drill.exec.work.foreman.ForemanException:
>>>> Unexpected exception during fragment initialization: Field ordinal 1 is
>>>> invalid for  type '(DrillRecordRow[**])'
>>>>      at org.apache.drill.exec.work
>>>> .foreman.Foreman.run(Foreman.java:303)
>>>>      at .......(:0)
>>>> Caused by: java.lang.AssertionError: Field ordinal 1 is invalid for
>> type
>>>> '(DrillRecordRow[**])'
>>>>      at
>>>> org.apache.calcite.rex.RexBuilder.makeFieldAccess(RexBuilder.java:197)
>>>>      at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:3694)
>>>>      at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter.access$2200(SqlToRelConverter.java:217)
>>>>      at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4765)
>>>>      at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4061)
>>>>      at
>>>> org.apache.calcite.sql.SqlIdentifier.accept(SqlIdentifier.java:317)
>>>>      at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4625)
>>>>      at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectList(SqlToRelConverter.java:3908)
>>>>      at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:670)
>>>>      at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:627)
>>>>      at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3150)
>>>>      at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:563)
>>>>      at
>>>> 
>> org.apache.drill.exec.planner.sql.SqlConverter.toRel(SqlConverter.java:414)
>>>>      at
>>>> 
>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRel(DefaultSqlHandler.java:685)
>>>>      at
>>>> 
>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:202)
>>>>      at
>>>> 
>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:172)
>>>>      at
>>>> 
>> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:226)
>>>>      at
>>>> 
>> org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:124)
>>>>      at
>>>> 
>> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:90)
>>>>      at org.apache.drill.exec.work
>>>> .foreman.Foreman.runSQL(Foreman.java:591)
>>>>      at org.apache.drill.exec.work
>>>> .foreman.Foreman.run(Foreman.java:276)
>>>>      ... 1 more
>>>> apache drill>
  

Reply via email to