[
https://issues.apache.org/jira/browse/DRILL-7978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440465#comment-17440465
]
ASF GitHub Bot commented on DRILL-7978:
---------------------------------------
dzamo commented on pull request #2282:
URL: https://github.com/apache/drill/pull/2282#issuecomment-963159298
> Let's consider a real world use case: some fixed width log generated by a
database. Since the fields may be mashed together, there isn't a delimiter that
you can use to divide the fields. You _could_ use however the logRegex reader
to do this. That point aside for the moment, the way I imagined someone using
this was that different configs could be set up and linked to workspaces such
that if a file was in the `mysql_logs` folder, it would use the mysql log
config, and if it was in the `postgres` it would use another.
@cgivre This use case would still work after two `CREATE SCHEMA` statements
to set the names and data types, wouldn't it? The schemas would be applied
every subsequent query.
> My opinion here is that the goal should be to get the cleanest data to the
user as possible without the user having to rely on CASTs and other
complicating factors.
Let's drop the CASTs, those aren't fun. So we're left with different ways a
user can specify column names and types.
1. With a `CREATE SCHEMA` against a directory.
2. With an inline schema to a table function.
3. With some plugin-specific format config that works for this plugin but
not for others.
Any one requires some effort, any one gets you to `select *` returning nice
results (is this claim actually true) which is super valuable. So shouldn't we
avoid the quirky 3 and commit to 1 and 2 consistently wherever we can?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> Fixed Width Format Plugin
> -------------------------
>
> Key: DRILL-7978
> URL: https://issues.apache.org/jira/browse/DRILL-7978
> Project: Apache Drill
> Issue Type: New Feature
> Components: Storage - Other
> Reporter: Megan Foss
> Priority: Major
>
> Developing format plugin to parse fixed width files.
> Fixed Width Text File Definition:
> https://www.oracle.com/webfolder/technetwork/data-quality/edqhelp/Content/introduction/getting_started/configuring_fixed_width_text_file_formats.htm
--
This message was sent by Atlassian Jira
(v8.20.1#820001)