[ https://issues.apache.org/jira/browse/HIVE-27590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
lvhu updated HIVE-27590: ------------------------ Environment: Any (was: {code:java} //代码占位符 {code}) > Make LINES TERMINATED BY work when creating table > ------------------------------------------------- > > Key: HIVE-27590 > URL: https://issues.apache.org/jira/browse/HIVE-27590 > Project: Hive > Issue Type: Improvement > Components: Hive, SQL > Affects Versions: 3.1.3 > Environment: Any > Reporter: lvhu > Assignee: lvhu > Priority: Major > > *The only way to set line delimiters when creating tables in the current hive > is like this:* > {code:java} > package abc.hive.MyFstTextInputFormat > public class MyFstTextInputFormat extends FileInputFormat<LongWritable, Text> > implements JobConfigurable { > ... > } > create table test ( > id string, > name string > ) > INPUTFORMAT 'abc.hive.MyFstTextInputFormat' {code} > If there are multiple different record delimiters, multiple TextInputFormats > need to be rewritten. > Unluckily, The ideal method is not supported yet: > {code:java} > create table test ( > id string, > name string > ) > row format delimited fields terminated by '\t' -- supported > LINES TERMINATED BY '|@|' ; -- not supported {code} > I have a solution that supports setting line delimiters when creating tables > just like above. > *1.create a new HiveTextInputFormat class to replace TextInputFormatn class.* > HiveTextInputFormat class read <pathToDelimiter> file to support setting > record delimiter for input files based on the prefix of the file path. > {code:java} > public class HiveTextInputFormat extends FileInputFormat<LongWritable, Text> > implements JobConfigurable { > .... > public RecordReader<LongWritable, Text> getRecordReader( > InputSplit genericSplit, JobConf > job, > Reporter reporter) > throws IOException { > > reporter.setStatus(genericSplit.toString()); > // default delimiter > String delimiter = job.get("textinputformat.record.delimiter"); > //Obtain the path of the file > String filePath = genericSplit.getPath().toUri().getPath(); > //Obtain a list of file paths and delimiter relationships by parsing the > <pathToDelimiter> file > Map pathToDelimiterMap = parsePathToDelimite()//Obtain by parsing the > <pathToDelimiter> file > for(Map.Entry<String, String> entry: pathToDelimiterMap.entrySet()){ > //config path > String configPath = entry.getKey(); > //if configPath is the prefix of filePath, set delimiter corresponding > to the file path > if(filePath.startsWith(configPath)) delimiter = entry.getValue(); > > } > byte[] recordDelimiterBytes = null; > if (null != delimiter) { > recordDelimiterBytes = delimiter.getBytes(Charsets.UTF_8); > } > return new LineRecordReader(job, (FileSplit) genericSplit, > recordDelimiterBytes); > } > } {code} > *2. modify hive create table class to support <LINES TERMINATED BY>* > {code:java} > create table test ( > id string, > name string > ) > LINES TERMINATED BY '|@|' ; > LOCATION hdfs_path; {code} > If Users execute above SQL, hive will insert (hdfs_path,'|@|') to > <pathToDelimiter> file. > Set HiveTextInputFormat as default INPUTFORMAT . > Looking forward to receiving your suggestions and feedback! > *If you accept my idea, I hope you can assign the task to me. My Github > account is: _lvhu-goodluck_* > I really hope to contribute code to the community > > > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)