[jira] [Commented] (DRILL-5771) Fix serDe errors for format plugins
[ https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16264320#comment-16264320 ] Arina Ielchiieva commented on DRILL-5771: - Merged into Apache master with commit id 7506cfbb5c8522d371c12dbdc2268d48a9449a48 > Fix serDe errors for format plugins > --- > > Key: DRILL-5771 > URL: https://issues.apache.org/jira/browse/DRILL-5771 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: ready-to-commit > Fix For: 1.12.0 > > > Create unit tests to check that all storage format plugins can be > successfully serialized / deserialized. > Usually this happens when query has several major fragments. > One way to check serde is to generate physical plan (generated as json) and > then submit it back to Drill. > One example of found errors is described in the first comment. Another > example is described in DRILL-5166. > *Serde issues:* > 1. Could not obtain format plugin during deserialization > Format plugin is created based on format plugin configuration or its name. > On Drill start up we load information about available plugins (its reloaded > each time storage plugin is updated, can be done only by admin). > When query is parsed, we try to get plugin from the available ones, it we can > not find one we try to [create > one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144] > but on other query execution stages we always assume that [plugin exists > based on > configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162]. > For example, during query parsing we had to create format plugin on one node > based on format configuration. > Then we have sent major fragment to the different node where we used this > format configuration we could not get format plugin based on it and > deserialization has failed. > To fix this problem we need to create format plugin during query > deserialization if it's absent. > > 2. Absent hash code and equals. > Format plugins are stored in hash map where key is format plugin config. > Since some format plugin configs did not have overridden hash code and > equals, we could not find format plugin based on its configuration. > 3. Named format plugin usage > Named format plugins configs allow to get format plugin by its name for > configuration shared among all drillbits. > They are used as alias for pre-configured format plugiins. User with admin > priliges can modify them at runtime. > Named format plugins configs are used instead of sending all non-default > parameters of format plugin config, in this case only name is sent. > Their usage in distributed system may cause raise conditions. > For example, > 1. Query is submitted. > 2. Parquet format plugin is created with the following configuration > (autoCorrectCorruptDates=>true). > 3. Seralized named format plugin config with name as parquet. > 4. Major fragment is sent to the different node. > 5. Admin has changed parquet configuration for the alias 'parquet' on all > nodes to autoCorrectCorruptDates=>false. > 6. Named format is deserialized on the different node into parquet format > plugin with configuration (autoCorrectCorruptDates=>false). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5771) Fix serDe errors for format plugins
[ https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263504#comment-16263504 ] ASF GitHub Bot commented on DRILL-5771: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/1014 > Fix serDe errors for format plugins > --- > > Key: DRILL-5771 > URL: https://issues.apache.org/jira/browse/DRILL-5771 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: ready-to-commit > Fix For: 1.12.0 > > > Create unit tests to check that all storage format plugins can be > successfully serialized / deserialized. > Usually this happens when query has several major fragments. > One way to check serde is to generate physical plan (generated as json) and > then submit it back to Drill. > One example of found errors is described in the first comment. Another > example is described in DRILL-5166. > *Serde issues:* > 1. Could not obtain format plugin during deserialization > Format plugin is created based on format plugin configuration or its name. > On Drill start up we load information about available plugins (its reloaded > each time storage plugin is updated, can be done only by admin). > When query is parsed, we try to get plugin from the available ones, it we can > not find one we try to [create > one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144] > but on other query execution stages we always assume that [plugin exists > based on > configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162]. > For example, during query parsing we had to create format plugin on one node > based on format configuration. > Then we have sent major fragment to the different node where we used this > format configuration we could not get format plugin based on it and > deserialization has failed. > To fix this problem we need to create format plugin during query > deserialization if it's absent. > > 2. Absent hash code and equals. > Format plugins are stored in hash map where key is format plugin config. > Since some format plugin configs did not have overridden hash code and > equals, we could not find format plugin based on its configuration. > 3. Named format plugin usage > Named format plugins configs allow to get format plugin by its name for > configuration shared among all drillbits. > They are used as alias for pre-configured format plugiins. User with admin > priliges can modify them at runtime. > Named format plugins configs are used instead of sending all non-default > parameters of format plugin config, in this case only name is sent. > Their usage in distributed system may cause raise conditions. > For example, > 1. Query is submitted. > 2. Parquet format plugin is created with the following configuration > (autoCorrectCorruptDates=>true). > 3. Seralized named format plugin config with name as parquet. > 4. Major fragment is sent to the different node. > 5. Admin has changed parquet configuration for the alias 'parquet' on all > nodes to autoCorrectCorruptDates=>false. > 6. Named format is deserialized on the different node into parquet format > plugin with configuration (autoCorrectCorruptDates=>false). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5771) Fix serDe errors for format plugins
[ https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258117#comment-16258117 ] ASF GitHub Bot commented on DRILL-5771: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1014 @parthchandra done. > Fix serDe errors for format plugins > --- > > Key: DRILL-5771 > URL: https://issues.apache.org/jira/browse/DRILL-5771 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: ready-to-commit > Fix For: 1.12.0 > > > Create unit tests to check that all storage format plugins can be > successfully serialized / deserialized. > Usually this happens when query has several major fragments. > One way to check serde is to generate physical plan (generated as json) and > then submit it back to Drill. > One example of found errors is described in the first comment. Another > example is described in DRILL-5166. > *Serde issues:* > 1. Could not obtain format plugin during deserialization > Format plugin is created based on format plugin configuration or its name. > On Drill start up we load information about available plugins (its reloaded > each time storage plugin is updated, can be done only by admin). > When query is parsed, we try to get plugin from the available ones, it we can > not find one we try to [create > one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144] > but on other query execution stages we always assume that [plugin exists > based on > configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162]. > For example, during query parsing we had to create format plugin on one node > based on format configuration. > Then we have sent major fragment to the different node where we used this > format configuration we could not get format plugin based on it and > deserialization has failed. > To fix this problem we need to create format plugin during query > deserialization if it's absent. > > 2. Absent hash code and equals. > Format plugins are stored in hash map where key is format plugin config. > Since some format plugin configs did not have overridden hash code and > equals, we could not find format plugin based on its configuration. > 3. Named format plugin usage > Named format plugins configs allow to get format plugin by its name for > configuration shared among all drillbits. > They are used as alias for pre-configured format plugiins. User with admin > priliges can modify them at runtime. > Named format plugins configs are used instead of sending all non-default > parameters of format plugin config, in this case only name is sent. > Their usage in distributed system may cause raise conditions. > For example, > 1. Query is submitted. > 2. Parquet format plugin is created with the following configuration > (autoCorrectCorruptDates=>true). > 3. Seralized named format plugin config with name as parquet. > 4. Major fragment is sent to the different node. > 5. Admin has changed parquet configuration for the alias 'parquet' on all > nodes to autoCorrectCorruptDates=>false. > 6. Named format is deserialized on the different node into parquet format > plugin with configuration (autoCorrectCorruptDates=>false). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5771) Fix serDe errors for format plugins
[ https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257790#comment-16257790 ] ASF GitHub Bot commented on DRILL-5771: --- Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1014 @arina-ielchiieva can you resolve the merge conflict? > Fix serDe errors for format plugins > --- > > Key: DRILL-5771 > URL: https://issues.apache.org/jira/browse/DRILL-5771 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: ready-to-commit > Fix For: 1.12.0 > > > Create unit tests to check that all storage format plugins can be > successfully serialized / deserialized. > Usually this happens when query has several major fragments. > One way to check serde is to generate physical plan (generated as json) and > then submit it back to Drill. > One example of found errors is described in the first comment. Another > example is described in DRILL-5166. > *Serde issues:* > 1. Could not obtain format plugin during deserialization > Format plugin is created based on format plugin configuration or its name. > On Drill start up we load information about available plugins (its reloaded > each time storage plugin is updated, can be done only by admin). > When query is parsed, we try to get plugin from the available ones, it we can > not find one we try to [create > one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144] > but on other query execution stages we always assume that [plugin exists > based on > configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162]. > For example, during query parsing we had to create format plugin on one node > based on format configuration. > Then we have sent major fragment to the different node where we used this > format configuration we could not get format plugin based on it and > deserialization has failed. > To fix this problem we need to create format plugin during query > deserialization if it's absent. > > 2. Absent hash code and equals. > Format plugins are stored in hash map where key is format plugin config. > Since some format plugin configs did not have overridden hash code and > equals, we could not find format plugin based on its configuration. > 3. Named format plugin usage > Named format plugins configs allow to get format plugin by its name for > configuration shared among all drillbits. > They are used as alias for pre-configured format plugiins. User with admin > priliges can modify them at runtime. > Named format plugins configs are used instead of sending all non-default > parameters of format plugin config, in this case only name is sent. > Their usage in distributed system may cause raise conditions. > For example, > 1. Query is submitted. > 2. Parquet format plugin is created with the following configuration > (autoCorrectCorruptDates=>true). > 3. Seralized named format plugin config with name as parquet. > 4. Major fragment is sent to the different node. > 5. Admin has changed parquet configuration for the alias 'parquet' on all > nodes to autoCorrectCorruptDates=>false. > 6. Named format is deserialized on the different node into parquet format > plugin with configuration (autoCorrectCorruptDates=>false). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5771) Fix serDe errors for format plugins
[ https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16250595#comment-16250595 ] ASF GitHub Bot commented on DRILL-5771: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1014 Thanks I took a look. > Fix serDe errors for format plugins > --- > > Key: DRILL-5771 > URL: https://issues.apache.org/jira/browse/DRILL-5771 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Fix For: 1.12.0 > > > Create unit tests to check that all storage format plugins can be > successfully serialized / deserialized. > Usually this happens when query has several major fragments. > One way to check serde is to generate physical plan (generated as json) and > then submit it back to Drill. > One example of found errors is described in the first comment. Another > example is described in DRILL-5166. > *Serde issues:* > 1. Could not obtain format plugin during deserialization > Format plugin is created based on format plugin configuration or its name. > On Drill start up we load information about available plugins (its reloaded > each time storage plugin is updated, can be done only by admin). > When query is parsed, we try to get plugin from the available ones, it we can > not find one we try to [create > one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144] > but on other query execution stages we always assume that [plugin exists > based on > configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162]. > For example, during query parsing we had to create format plugin on one node > based on format configuration. > Then we have sent major fragment to the different node where we used this > format configuration we could not get format plugin based on it and > deserialization has failed. > To fix this problem we need to create format plugin during query > deserialization if it's absent. > > 2. Absent hash code and equals. > Format plugins are stored in hash map where key is format plugin config. > Since some format plugin configs did not have overridden hash code and > equals, we could not find format plugin based on its configuration. > 3. Named format plugin usage > Named format plugins configs allow to get format plugin by its name for > configuration shared among all drillbits. > They are used as alias for pre-configured format plugiins. User with admin > priliges can modify them at runtime. > Named format plugins configs are used instead of sending all non-default > parameters of format plugin config, in this case only name is sent. > Their usage in distributed system may cause raise conditions. > For example, > 1. Query is submitted. > 2. Parquet format plugin is created with the following configuration > (autoCorrectCorruptDates=>true). > 3. Seralized named format plugin config with name as parquet. > 4. Major fragment is sent to the different node. > 5. Admin has changed parquet configuration for the alias 'parquet' on all > nodes to autoCorrectCorruptDates=>false. > 6. Named format is deserialized on the different node into parquet format > plugin with configuration (autoCorrectCorruptDates=>false). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5771) Fix serDe errors for format plugins
[ https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249355#comment-16249355 ] ASF GitHub Bot commented on DRILL-5771: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1014 @ilooner Paul has correctly described race condition issue. At planning time when we have defined plugin configuration we want to make sure it would stay the same during query execution. Relying on plugin configuration name is unsafe since config under this particular name can change and it can be that one fragment of the query is executed with one configuration and another with different (typical race condition issue). Regarding your last question, during physical plan creation I don't think we deserialize DrillTable, we rather deserialize query fragments (it can be seen from json profile). Also you can start looking, for example, from here: https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/work/QueryWorkUnit.java#L59 > Fix serDe errors for format plugins > --- > > Key: DRILL-5771 > URL: https://issues.apache.org/jira/browse/DRILL-5771 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: 1.12.0 > > > Create unit tests to check that all storage format plugins can be > successfully serialized / deserialized. > Usually this happens when query has several major fragments. > One way to check serde is to generate physical plan (generated as json) and > then submit it back to Drill. > One example of found errors is described in the first comment. Another > example is described in DRILL-5166. > *Serde issues:* > 1. Could not obtain format plugin during deserialization > Format plugin is created based on format plugin configuration or its name. > On Drill start up we load information about available plugins (its reloaded > each time storage plugin is updated, can be done only by admin). > When query is parsed, we try to get plugin from the available ones, it we can > not find one we try to [create > one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144] > but on other query execution stages we always assume that [plugin exists > based on > configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162]. > For example, during query parsing we had to create format plugin on one node > based on format configuration. > Then we have sent major fragment to the different node where we used this > format configuration we could not get format plugin based on it and > deserialization has failed. > To fix this problem we need to create format plugin during query > deserialization if it's absent. > > 2. Absent hash code and equals. > Format plugins are stored in hash map where key is format plugin config. > Since some format plugin configs did not have overridden hash code and > equals, we could not find format plugin based on its configuration. > 3. Named format plugin usage > Named format plugins configs allow to get format plugin by its name for > configuration shared among all drillbits. > They are used as alias for pre-configured format plugiins. User with admin > priliges can modify them at runtime. > Named format plugins configs are used instead of sending all non-default > parameters of format plugin config, in this case only name is sent. > Their usage in distributed system may cause raise conditions. > For example, > 1. Query is submitted. > 2. Parquet format plugin is created with the following configuration > (autoCorrectCorruptDates=>true). > 3. Seralized named format plugin config with name as parquet. > 4. Major fragment is sent to the different node. > 5. Admin has changed parquet configuration for the alias 'parquet' on all > nodes to autoCorrectCorruptDates=>false. > 6. Named format is deserialized on the different node into parquet format > plugin with configuration (autoCorrectCorruptDates=>false). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5771) Fix serDe errors for format plugins
[ https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16248056#comment-16248056 ] ASF GitHub Bot commented on DRILL-5771: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1014 @paul-rogers Thanks for the explanation. Your explanation is in sync with Arina's descriptions in the ticket and with the code changes. The only point of confusion I have now is with regards to the Calcite things changes you mentioned. I was under the impression that the following happened: 1. FormatPlugins created a DrillTable, which held the format configuration. 1. The DrillTable is then encapsulated in a ScanOperator 1. The ScanOperator automagically gets serialized into the PhysicalPlan with the DrillTable and corresponding format configuration correctly. But it seems like there are more Calcite things at work here. @arina-ielchiieva could you please point me to the general flow I need to look at in order to understand how the FormatPluginConfig interacts with Calcite and gets serialized into the PhysicalPlan? > Fix serDe errors for format plugins > --- > > Key: DRILL-5771 > URL: https://issues.apache.org/jira/browse/DRILL-5771 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: 1.12.0 > > > Create unit tests to check that all storage format plugins can be > successfully serialized / deserialized. > Usually this happens when query has several major fragments. > One way to check serde is to generate physical plan (generated as json) and > then submit it back to Drill. > One example of found errors is described in the first comment. Another > example is described in DRILL-5166. > *Serde issues:* > 1. Could not obtain format plugin during deserialization > Format plugin is created based on format plugin configuration or its name. > On Drill start up we load information about available plugins (its reloaded > each time storage plugin is updated, can be done only by admin). > When query is parsed, we try to get plugin from the available ones, it we can > not find one we try to [create > one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144] > but on other query execution stages we always assume that [plugin exists > based on > configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162]. > For example, during query parsing we had to create format plugin on one node > based on format configuration. > Then we have sent major fragment to the different node where we used this > format configuration we could not get format plugin based on it and > deserialization has failed. > To fix this problem we need to create format plugin during query > deserialization if it's absent. > > 2. Absent hash code and equals. > Format plugins are stored in hash map where key is format plugin config. > Since some format plugin configs did not have overridden hash code and > equals, we could not find format plugin based on its configuration. > 3. Named format plugin usage > Named format plugins configs allow to get format plugin by its name for > configuration shared among all drillbits. > They are used as alias for pre-configured format plugiins. User with admin > priliges can modify them at runtime. > Named format plugins configs are used instead of sending all non-default > parameters of format plugin config, in this case only name is sent. > Their usage in distributed system may cause raise conditions. > For example, > 1. Query is submitted. > 2. Parquet format plugin is created with the following configuration > (autoCorrectCorruptDates=>true). > 3. Seralized named format plugin config with name as parquet. > 4. Major fragment is sent to the different node. > 5. Admin has changed parquet configuration for the alias 'parquet' on all > nodes to autoCorrectCorruptDates=>false. > 6. Named format is deserialized on the different node into parquet format > plugin with configuration (autoCorrectCorruptDates=>false). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5771) Fix serDe errors for format plugins
[ https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16247961#comment-16247961 ] ASF GitHub Bot commented on DRILL-5771: --- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1014 @ilooner, there are two phases of issues. The problem is in the planner, but we've been talking about serialization out to the workers. Here, we can learn from the work Arina did with dynamic UDFs, especially in the synchronization aspects as it has potential challenges similar to what we have here. Let's walk through the lifecycle. * User A defines a storage plugin config, call it P. * User B else runs a query Q that references P. * At the same time, user A changes P to P'. * Query Q is distributed to nodes. * At the same time, user A changes P again to produce P''. This is a classic distributed system synchronization problem. As we discussed, once the query is planned, the contents of the plugin definition are serialized as part of the physical plan for each fragment. As a result we "freeze" the contents of P at the time of serialization. The problem that Arina seems to be describing is during planning. Rather than taking a copy of P at some fixed point in time, then always using that copy; we seem to be holding a named reference to P. This introduces the obvious synchronization issues. So, what we should do is, once a query resolves P, the query takes a copy and uses that copy for the rest of query planning and execution. This resolves race conditions by saying that a query (on all nodes) uses the version of the plugin definition that existed at the moment that the reference to the plugin definition was first resolved by that query. The query will be oblivious to all subsequent updates. Since plugin definition is done as part of workspace and table definition in Calcite, this is likely to be a tricky change; it is not clear that Calcite offers a per-query context to hold such information. Arina probably can offer advice on this front. > Fix serDe errors for format plugins > --- > > Key: DRILL-5771 > URL: https://issues.apache.org/jira/browse/DRILL-5771 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: 1.12.0 > > > Create unit tests to check that all storage format plugins can be > successfully serialized / deserialized. > Usually this happens when query has several major fragments. > One way to check serde is to generate physical plan (generated as json) and > then submit it back to Drill. > One example of found errors is described in the first comment. Another > example is described in DRILL-5166. > *Serde issues:* > 1. Could not obtain format plugin during deserialization > Format plugin is created based on format plugin configuration or its name. > On Drill start up we load information about available plugins (its reloaded > each time storage plugin is updated, can be done only by admin). > When query is parsed, we try to get plugin from the available ones, it we can > not find one we try to [create > one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144] > but on other query execution stages we always assume that [plugin exists > based on > configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162]. > For example, during query parsing we had to create format plugin on one node > based on format configuration. > Then we have sent major fragment to the different node where we used this > format configuration we could not get format plugin based on it and > deserialization has failed. > To fix this problem we need to create format plugin during query > deserialization if it's absent. > > 2. Absent hash code and equals. > Format plugins are stored in hash map where key is format plugin config. > Since some format plugin configs did not have overridden hash code and > equals, we could not find format plugin based on its configuration. > 3. Named format plugin usage > Named format plugins configs allow to get format plugin by its name for > configuration shared among all drillbits. > They are used as alias for pre-configured format plugiins. User with admin > priliges can modify them at runtime. > Named format plugins configs are used instead of sending all non-default > parameters of format plugin config, in this case only name is sent. > Their usa
[jira] [Commented] (DRILL-5771) Fix serDe errors for format plugins
[ https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16247895#comment-16247895 ] ASF GitHub Bot commented on DRILL-5771: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1014 @paul-rogers I follow your explanation except for the last paragraph. The Drill UI let's me update storage plugins and their formatters at runtime (at least on my laptop) by going to **Storage** tab and clicking on **Update** for a plugin. It also let's me add storage plugins provided the necessary jars are in the classpath. I thought the goal of this change was to address the fact that such a runtime update cannot be propagated to the cluster atomically, so it is unreliable to reference a FormatPluginConfig by name (which was done previously). It is, however, safe to copy the entire FormatPluginConfig into the query plan as you said. Is this accurate? > Fix serDe errors for format plugins > --- > > Key: DRILL-5771 > URL: https://issues.apache.org/jira/browse/DRILL-5771 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: 1.12.0 > > > Create unit tests to check that all storage format plugins can be > successfully serialized / deserialized. > Usually this happens when query has several major fragments. > One way to check serde is to generate physical plan (generated as json) and > then submit it back to Drill. > One example of found errors is described in the first comment. Another > example is described in DRILL-5166. > *Serde issues:* > 1. Could not obtain format plugin during deserialization > Format plugin is created based on format plugin configuration or its name. > On Drill start up we load information about available plugins (its reloaded > each time storage plugin is updated, can be done only by admin). > When query is parsed, we try to get plugin from the available ones, it we can > not find one we try to [create > one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144] > but on other query execution stages we always assume that [plugin exists > based on > configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162]. > For example, during query parsing we had to create format plugin on one node > based on format configuration. > Then we have sent major fragment to the different node where we used this > format configuration we could not get format plugin based on it and > deserialization has failed. > To fix this problem we need to create format plugin during query > deserialization if it's absent. > > 2. Absent hash code and equals. > Format plugins are stored in hash map where key is format plugin config. > Since some format plugin configs did not have overridden hash code and > equals, we could not find format plugin based on its configuration. > 3. Named format plugin usage > Named format plugins configs allow to get format plugin by its name for > configuration shared among all drillbits. > They are used as alias for pre-configured format plugiins. User with admin > priliges can modify them at runtime. > Named format plugins configs are used instead of sending all non-default > parameters of format plugin config, in this case only name is sent. > Their usage in distributed system may cause raise conditions. > For example, > 1. Query is submitted. > 2. Parquet format plugin is created with the following configuration > (autoCorrectCorruptDates=>true). > 3. Seralized named format plugin config with name as parquet. > 4. Major fragment is sent to the different node. > 5. Admin has changed parquet configuration for the alias 'parquet' on all > nodes to autoCorrectCorruptDates=>false. > 6. Named format is deserialized on the different node into parquet format > plugin with configuration (autoCorrectCorruptDates=>false). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5771) Fix serDe errors for format plugins
[ https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16246879#comment-16246879 ] ASF GitHub Bot commented on DRILL-5771: --- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1014 Not sure the description here is entirely correct. Let's separate two concepts: the plugin (code) and the plugin definition (the stuff in JSON.) Plugin definitions are stored in ZK and retrieved by the Foreman. There may be some form of race condition in the Foreman, but that's not my focus here. The plugin *definition* is read by the Forman and serialized into the physical plan. Each worker reads the definition from the physical plan. For this reason, the worker's definition can never be out of date: it is the definition used when serializing the plan. Further, Drill allows table functions which provide query-time name/value pair settings for format plugin properties. The only way these can work is to be serialized along with the query. So, the actual serialized format plugin definition, included with the query, includes both the ZK information and the table function information. > Fix serDe errors for format plugins > --- > > Key: DRILL-5771 > URL: https://issues.apache.org/jira/browse/DRILL-5771 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: 1.12.0 > > > Create unit tests to check that all storage format plugins can be > successfully serialized / deserialized. > Usually this happens when query has several major fragments. > One way to check serde is to generate physical plan (generated as json) and > then submit it back to Drill. > One example of found errors is described in the first comment. Another > example is described in DRILL-5166. > *Serde issues:* > 1. Could not obtain format plugin during deserialization > Format plugin is created based on format plugin configuration or its name. > On Drill start up we load information about available plugins (its reloaded > each time storage plugin is updated, can be done only by admin). > When query is parsed, we try to get plugin from the available ones, it we can > not find one we try to [create > one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144] > but on other query execution stages we always assume that [plugin exists > based on > configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162]. > For example, during query parsing we had to create format plugin on one node > based on format configuration. > Then we have sent major fragment to the different node where we used this > format configuration we could not get format plugin based on it and > deserialization has failed. > To fix this problem we need to create format plugin during query > deserialization if it's absent. > > 2. Absent hash code and equals. > Format plugins are stored in hash map where key is format plugin config. > Since some format plugin configs did not have overridden hash code and > equals, we could not find format plugin based on its configuration. > 3. Named format plugin usage > Named format plugins configs allow to get format plugin by its name for > configuration shared among all drillbits. > They are used as alias for pre-configured format plugiins. User with admin > priliges can modify them at runtime. > Named format plugins configs are used instead of sending all non-default > parameters of format plugin config, in this case only name is sent. > Their usage in distributed system may cause raise conditions. > For example, > 1. Query is submitted. > 2. Parquet format plugin is created with the following configuration > (autoCorrectCorruptDates=>true). > 3. Seralized named format plugin config with name as parquet. > 4. Major fragment is sent to the different node. > 5. Admin has changed parquet configuration for the alias 'parquet' on all > nodes to autoCorrectCorruptDates=>false. > 6. Named format is deserialized on the different node into parquet format > plugin with configuration (autoCorrectCorruptDates=>false). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5771) Fix serDe errors for format plugins
[ https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16246871#comment-16246871 ] ASF GitHub Bot commented on DRILL-5771: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1014 @arina-ielchiieva - The parts addressing DRILL-4640 and DRILL-5166 LGTM - I think the fix for DRILL-5771 LGTM but I would like write down what I think is happening and confirm with you that my understanding is correct. This is mostly just a learning exercise for me since I am not very familiar with this part of the code :). In DRILL-5771 there were two issues. ## Race Conditions With Format Plugins ### Issue The following used to happen before the fix: 1. When using an existing format plugin, the **FormatPlugin** would create a **DrillTable** with a **NamedFormatPluginConfig** which only contains the name of the format plugin to use. 1. The **ScanOperator** created for a **DrillTable** will contain the **NamedFormatPluginConfig** 1. When the **ScanOperators** are serialized in to the physical plan the serialized **ScanOperator** will only contain the name of the format plugin to use. 1. When a worker deserializes the physical plan to do a scan, he gets the name of the **FormatPluginConfig** to use. 1. The worker then looks up the correct **FormatPlugin** in the **FormatCreator** using the name he has. 1. The worker can get into trouble if the **FormatPlugins** he has cached in his **FormatCreator** is out of sync with the rest of the cluster. ### Fix Race conditions are eliminated because the **DrillTables** returned by the **FormatPlugins** no longer contain a **NamedFormatPluginConfig**, they contain the full **FormatPluginConfig** not just a name alias. So when a query is executed: 1. The ScanOperator contains the complete **FormatPluginConfig** 1. When the physical plan is serialized it contains the complete **FormatPluginConfig** for each scan operator. 1. When a worker node deserializes the ScanOperator it also has the complete **FormatPluginConfig** so it can reconstruct the **FormatPlugin** correctly, whereas previously the worker would have to do a lookup using the **FormatPlugin** name in the **FormatCreator** when the cache in the **FormatCreator** may be out of sync with the rest of the cluster. ## FormatPluginConfig Equals and HashCode ### Issue The **FileSystemPlugin** looks up **FormatPlugins** corresponding to a **FormatPluginConfig** in formatPluginsByConfig. However, the **FormatPluginConfig** implementations didn't override equals and hashCode. > Fix serDe errors for format plugins > --- > > Key: DRILL-5771 > URL: https://issues.apache.org/jira/browse/DRILL-5771 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: 1.12.0 > > > Create unit tests to check that all storage format plugins can be > successfully serialized / deserialized. > Usually this happens when query has several major fragments. > One way to check serde is to generate physical plan (generated as json) and > then submit it back to Drill. > One example of found errors is described in the first comment. Another > example is described in DRILL-5166. > *Serde issues:* > 1. Could not obtain format plugin during deserialization > Format plugin is created based on format plugin configuration or its name. > On Drill start up we load information about available plugins (its reloaded > each time storage plugin is updated, can be done only by admin). > When query is parsed, we try to get plugin from the available ones, it we can > not find one we try to [create > one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144] > but on other query execution stages we always assume that [plugin exists > based on > configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162]. > For example, during query parsing we had to create format plugin on one node > based on format configuration. > Then we have sent major fragment to the different node where we used this > format configuration we could not get format plugin based on it and > deserialization has failed. > To fix this problem we need to create format plugin during query > deserialization if it's absent. > > 2. Absent hash code and equals. > Format plugins are stored in has
[jira] [Commented] (DRILL-5771) Fix serDe errors for format plugins
[ https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16245783#comment-16245783 ] ASF GitHub Bot commented on DRILL-5771: --- Github user priteshm commented on the issue: https://github.com/apache/drill/pull/1014 @ilooner can you please review this? > Fix serDe errors for format plugins > --- > > Key: DRILL-5771 > URL: https://issues.apache.org/jira/browse/DRILL-5771 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: 1.12.0 > > > Create unit tests to check that all storage format plugins can be > successfully serialized / deserialized. > Usually this happens when query has several major fragments. > One way to check serde is to generate physical plan (generated as json) and > then submit it back to Drill. > One example of found errors is described in the first comment. Another > example is described in DRILL-5166. > *Serde issues:* > 1. Could not obtain format plugin during deserialization > Format plugin is created based on format plugin configuration or its name. > On Drill start up we load information about available plugins (its reloaded > each time storage plugin is updated, can be done only by admin). > When query is parsed, we try to get plugin from the available ones, it we can > not find one we try to [create > one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144] > but on other query execution stages we always assume that [plugin exists > based on > configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162]. > For example, during query parsing we had to create format plugin on one node > based on format configuration. > Then we have sent major fragment to the different node where we used this > format configuration we could not get format plugin based on it and > deserialization has failed. > To fix this problem we need to create format plugin during query > deserialization if it's absent. > > 2. Absent hash code and equals. > Format plugins are stored in hash map where key is format plugin config. > Since some format plugin configs did not have overridden hash code and > equals, we could not find format plugin based on its configuration. > 3. Named format plugin usage > Named format plugins configs allow to get format plugin by its name for > configuration shared among all drillbits. > They are used as alias for pre-configured format plugiins. User with admin > priliges can modify them at runtime. > Named format plugins configs are used instead of sending all non-default > parameters of format plugin config, in this case only name is sent. > Their usage in distributed system may cause raise conditions. > For example, > 1. Query is submitted. > 2. Parquet format plugin is created with the following configuration > (autoCorrectCorruptDates=>true). > 3. Seralized named format plugin config with name as parquet. > 4. Major fragment is sent to the different node. > 5. Admin has changed parquet configuration for the alias 'parquet' on all > nodes to autoCorrectCorruptDates=>false. > 6. Named format is deserialized on the different node into parquet format > plugin with configuration (autoCorrectCorruptDates=>false). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5771) Fix serDe errors for format plugins
[ https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16222504#comment-16222504 ] ASF GitHub Bot commented on DRILL-5771: --- GitHub user arina-ielchiieva opened a pull request: https://github.com/apache/drill/pull/1014 DRILL-5771: Fix serDe errors for format plugins 1. Fix various serde issues for format plugins described in [DRILL-5771](https://issues.apache.org/jira/browse/DRILL-5771). 2. Throw meaninful exception instead of NPE when table is not found when table function is used. 3. Add unit tests for all format plugins for ensure serde is checked (physical plan is generated in json format and then submitted). 4. Fix physical plan submission on Windows ([DRILL-4640](https://issues.apache.org/jira/browse/DRILL-4640)). You can merge this pull request into a Git repository by running: $ git pull https://github.com/arina-ielchiieva/drill DRILL-5771 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1014.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1014 commit 72544d5aa9f0add3075c2eb174c49b355d3c85e2 Author: Arina Ielchiieva Date: 2017-10-26T13:09:00Z DRILL-5771: Fix serDe errors for format plugins 1. Fix various serde issues for format plugins described in DRILL-5771. 2. Throw meaninful exception instead of NPE when table is not found when table function is used. 3. Added unit tests for all format plugins for ensure serde is checked (physical plan is generated in json format and then submitted). 4. Fix physical plan submission on Windows (DRILL-4640). > Fix serDe errors for format plugins > --- > > Key: DRILL-5771 > URL: https://issues.apache.org/jira/browse/DRILL-5771 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Minor > > Create unit tests to check that all storage format plugins can be > successfully serialized / deserialized. > Usually this happens when query has several major fragments. > One way to check serde is to generate physical plan (generated as json) and > then submit it back to Drill. > One example of found errors is described in the first comment. Another > example is described in DRILL-5166. > *Serde issues:* > 1. Could not obtain format plugin during deserialization > Format plugin is created based on format plugin configuration or its name. > On Drill start up we load information about available plugins (its reloaded > each time storage plugin is updated, can be done only by admin). > When query is parsed, we try to get plugin from the available ones, it we can > not find one we try to [create > one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144] > but on other query execution stages we always assume that [plugin exists > based on > configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162]. > For example, during query parsing we had to create format plugin on one node > based on format configuration. > Then we have sent major fragment to the different node where we used this > format configuration we could not get format plugin based on it and > deserialization has failed. > To fix this problem we need to create format plugin during query > deserialization if it's absent. > > 2. Absent hash code and equals. > Format plugins are stored in hash map where key is format plugin config. > Since some format plugin configs did not have overridden hash code and > equals, we could not find format plugin based on its configuration. > 3. Named format plugin usage > Named format plugins configs allow to get format plugin by its name for > configuration shared among all drillbits. > They are used as alias for pre-configured format plugiins. User with admin > priliges can modify them at runtime. > Named format plugins configs are used instead of sending all non-default > parameters of format plugin config, in this case only name is sent. > Their usage in distributed system may cause raise conditions. > For example, > 1. Query is submitted. > 2. Parquet format plugin is created with the following configuration > (autoCorrectCorruptDates=>true). > 3. Seralized named format plugin config with name as parquet. > 4. Major fragment is sent to the different node. > 5. Admin has changed parquet configuration for the alias 'parquet' on all > nodes to aut