[jira] [Commented] (DRILL-7125) REFRESH TABLE METADATA fails after upgrade from Drill 1.13.0 to Drill 1.15.0

2019-04-01 Thread Kunal Khatua (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807281#comment-16807281
 ] 

Kunal Khatua commented on DRILL-7125:
-

[~shamirwasia] does this require any Documentation? (cc: [~bbevens])

> REFRESH TABLE METADATA fails after upgrade from Drill 1.13.0 to Drill 1.15.0
> 
>
> Key: DRILL-7125
> URL: https://issues.apache.org/jira/browse/DRILL-7125
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.14.0, 1.15.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> REFRESH TABLE METADATA command worked successfully on Drill 1.13.0, however 
> after upgrade Drill 1.15.0 there are errors sometime.
> {code:java}
> In sqlline logging in as regular user "alice" or Drill process user "admin" 
> gives the same error (permission denied)
> If this helps, here's also what I am seeing on sqlline
> Error message contains random but valid user's names other than the user 
> (Alice) that logged in to refresh the metadata. Looks like during the refresh 
> metadata drillbits seems to incorrectly try the metadata generation as some 
> random user which obviously does not have write access
> 2019-03-12 15:27:20,564 [2377cdd9-dd6e-d213-de1a-70b50d3641d7:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 2377cdd9-dd6e-d213-de1a-70b50d3641d7:0:0: State change requested RUNNING --> 
> FINISHED
> 2019-03-12 15:27:20,564 [2377cdd9-dd6e-d213-de1a-70b50d3641d7:frag:0:0] INFO  
> o.a.d.e.w.f.FragmentStatusReporter - 
> 2377cdd9-dd6e-d213-de1a-70b50d3641d7:0:0: State to report: FINISHED
> 2019-03-12 15:27:23,032 [2377cdb3-86cc-438d-8ada-787d2a84df9a:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query with id 
> 2377cdb3-86cc-438d-8ada-787d2a84df9a issued by alice: REFRESH TABLE METADATA 
> dfs.root.`/user/alice/logs/hive/warehouse/detail`
> 2019-03-12 15:27:23,350 [2377cdb3-86cc-438d-8ada-787d2a84df9a:foreman] ERROR 
> o.a.d.e.s.parquet.metadata.Metadata - Failed to read 
> 'file://user/alice/logs/hive/warehouse/detail/.drill.parquet_metadata_directories'
>  metadata file
> java.io.IOException: 2879.5854742.1036302960 
> /user/alice/logs/hive/warehouse/detail/file1/.drill.parquet_metadata 
> (Permission denied)
> at com.mapr.fs.Inode.throwIfFailed(Inode.java:390) 
> ~[maprfs-6.1.0-mapr.jar:na]
> at com.mapr.fs.Inode.flushPages(Inode.java:505) 
> ~[maprfs-6.1.0-mapr.jar:na]
> at com.mapr.fs.Inode.releaseDirty(Inode.java:583) 
> ~[maprfs-6.1.0-mapr.jar:na]
> at 
> com.mapr.fs.MapRFsOutStream.dropCurrentPage(MapRFsOutStream.java:73) 
> ~[maprfs-6.1.0-mapr.jar:na]
> at com.mapr.fs.MapRFsOutStream.write(MapRFsOutStream.java:85) 
> ~[maprfs-6.1.0-mapr.jar:na]
> at 
> com.mapr.fs.MapRFsDataOutputStream.write(MapRFsDataOutputStream.java:39) 
> ~[maprfs-6.1.0-mapr.jar:na]
> at 
> com.fasterxml.jackson.core.json.UTF8JsonGenerator._flushBuffer(UTF8JsonGenerator.java:2085)
>  ~[jackson-core-2.9.5.jar:2.9.5]
> at 
> com.fasterxml.jackson.core.json.UTF8JsonGenerator.flush(UTF8JsonGenerator.java:1097)
>  ~[jackson-core-2.9.5.jar:2.9.5]
> at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValue(ObjectMapper.java:2645)
>  ~[jackson-databind-2.9.5.jar:2.9.5]
> at 
> com.fasterxml.jackson.core.base.GeneratorBase.writeObject(GeneratorBase.java:381)
>  ~[jackson-core-2.9.5.jar:2.9.5]
> at 
> com.fasterxml.jackson.core.JsonGenerator.writeObjectField(JsonGenerator.java:1726)
>  ~[jackson-core-2.9.5.jar:2.9.5]
> at 
> org.apache.drill.exec.store.parquet.metadata.Metadata_V3$ColumnMetadata_v3$Serializer.serialize(Metadata_V3.java:448)
>  ~[drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr]
> at 
> org.apache.drill.exec.store.parquet.metadata.Metadata_V3$ColumnMetadata_v3$Serializer.serialize(Metadata_V3.java:417)
>  ~[drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr]
> at 
> com.fasterxml.jackson.databind.ser.impl.IndexedListSerializer.serializeContents(IndexedListSerializer.java:119)
>  ~[jackson-databind-2.9.5.jar:2.9.5]
> at 
> com.fasterxml.jackson.databind.ser.impl.IndexedListSerializer.serialize(IndexedListSerializer.java:79)
>  ~[jackson-databind-2.9.5.jar:2.9.5]
> at 
> com.fasterxml.jackson.databind.ser.impl.IndexedListSerializer.serialize(IndexedListSerializer.java:18)
>  ~[jackson-databind-2.9.5.jar:2.9.5]
> at 
> com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:727)
>  ~[jackson-databind-2.9.5.jar:2.9.5]
> at 
> 

[jira] [Commented] (DRILL-7125) REFRESH TABLE METADATA fails after upgrade from Drill 1.13.0 to Drill 1.15.0

2019-03-20 Thread Sorabh Hamirwasia (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797522#comment-16797522
 ] 

Sorabh Hamirwasia commented on DRILL-7125:
--

*Sorabh:*
I was looking into this customer issue where *Refresh Metadata* query is 
failing after upgrade from 1.13.0 to 1.15.0. Below are my findings:
 
1) Until 1.13 all the metadata cache file related access was done using process 
user. This was enforced by DRILL-4143 [PR: 
Link|https://github.com/apache/drill/pull/470/files] .The way it was done is 
all the filesystem instance passed to Metadata class was ignore and instead 
FileSystem instance created using process user context was only used.
 
2) In 1.14 with DRILL-6331, [PR: 
Link|https://github.com/apache/drill/commit/c6549e58859397c88cb1de61b4f6eee52a07ed0c#diff-45bfe599bc468a9fd85319266b1222dd]
 there was some refactoring done to support filter/limit push down in cases 
where Hive Native Parquet reader is used. This change removed the fix done in 
DRILL-4143 above. I guess it expected that all the caller will call with 
correct FileSystem object references which was actually not the case.
 
With change 2 above from 1.14 it looks like metadata refresh call (both 
explicit and auto refresh) will not work if issued with a user other than Drill 
process user. Probably all the tests are run as process user and hence the 
issue was never caught.
 
Arina - Do you remember if there was any other reason to remove the fix in 
DRILL-6331 which was made as part of DRILL-4143 ? 
 
If not then to revert back to older behavior quick fix will be again to ignore 
the filesystem instances passed in all the Metadata class api's and just use 
the process user FileSystem instance. But this has security issues which were 
never fixed and needs to be thought through to address it. *For example:* 
 
1) REFRESH Metadata query can be executed by any user. Even the ones who 
doesn't own's the actual data.
2) Metadata file itself is not encrypted hence any user can read this file.
3) If we encrypt the metadata file such that only Drill can understand. Still a 
user who doesn't have access to actual data can run query which is totally 
served by Metadata file and can get access to those information. Like count(*) 
and min/max.
 
*Arina:*
Hi Sorabh, 
 
The original logic was not removed. Metadata classes started to accept file 
system object but for Drill tables it is still created under process user, its 
just ParquetGroupScan is responsible for this.
For the refresh metadata use case, I guess we need to create file system under 
process user in the RefreshMetadataHandler [2].
 
Some background:
Metadata class is responsible for creating metadata files and reading metadata 
from footers. While for Drill tables having file system under process user is 
acceptable, for Hive table special file system is created for each path [3].
That's why responsibility of creating proper file system was moved from 
Metadata class to Group Scan classes. I guess I just missed that 
RefreshMetadataHandler also calls methods from Metadata class and passes file 
system.
 
[1]  
[https://github.com/apache/drill/blob/3d29faf81da593035f6bd38dd56d48e719afe7d4/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java#L241]
  
[2]  
[https://github.com/apache/drill/blob/5aa38a51d90998234b4ca46434ce225df72addc5/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/RefreshMetadataHandler.java#L119]
  
[3]  
[https://github.com/apache/drill/blob/3d29faf81da593035f6bd38dd56d48e719afe7d4/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java#L207]
  
 
*Sorabh:*
Hi Arina*,*
Thanks for your input. With DRILL-6331 the expectation was all the caller of 
Metadata class api needs to pass filesystem created under correct user context. 
But Chunhui's fix basically just made a change in Metadata class itself to 
create a process user related filesystem ignoring the passed object from caller 
(probably to avoid changing in all the caller. Kind of a hack). 
 
I know that RefreshMetadataHandler code doesn't create process user file 
system, but that will still not fix the issue. Because in that handler it calls 
getTable() [1] which internally calls WorkspaceSchemaFactory::create ---> 
isReadable ---> readBlockMeta using the query user (need not be process user). 
Since readBlockMeta is used by both autoRefresh code path and this 
RefreshMetadata code path internally it tries to actually write metadata file 
if modification time is obsolete. Since this write happens using the query user 
file system object it fails. Please see the stack trace below. Just creating 
file system object with process user in RefreshMetadataHandler [2] will help to 
resolve the case when REFRESH command was executed by a process user only. 
 
Now for all other queries when same *isReadable* code path