lvhu created HIVE-27590: --------------------------- Summary: Make LINES TERMINATED BY work when creating table Key: HIVE-27590 URL: https://issues.apache.org/jira/browse/HIVE-27590 Project: Hive Issue Type: Improvement Components: Hive, SQL Affects Versions: 3.1.3 Environment: {code:java} //代码占位符 {code} Reporter: lvhu Assignee: lvhu
*The only way to set line delimiters when creating tables in the current hive is like this:* {code:java} package abc.hive.MyFstTextInputFormat public class MyFstTextInputFormat extends FileInputFormat<LongWritable, Text> implements JobConfigurable { ... } create table test ( id string, name string ) INPUTFORMAT 'abc.hive.MyFstTextInputFormat' {code} If there are multiple different record delimiters, multiple TextInputFormats need to be rewritten. Unluckily, The ideal method is not supported yet: {code:java} create table test ( id string, name string ) row format delimited fields terminated by '\t' -- supported LINES TERMINATED BY '|@|' ; -- not supported {code} I have a solution that supports setting line delimiters when creating tables just like above. *1. create a new HiveTextInputFormat class to replace TextInputFormatn class.* HiveTextInputFormat class read <pathToDelimiter> file to support setting record delimiter for input files based on the prefix of the file path. {code:java} public class HiveTextInputFormat extends FileInputFormat<LongWritable, Text> implements JobConfigurable { .... public RecordReader<LongWritable, Text> getRecordReader( InputSplit genericSplit, JobConf job, Reporter reporter) throws IOException { reporter.setStatus(genericSplit.toString()); // default delimiter String delimiter = job.get("textinputformat.record.delimiter"); //Obtain the path of the file String filePath = genericSplit.getPath().toUri().getPath(); //Obtain a list of file paths and delimiter relationships by parsing the <pathToDelimiter> file Map pathToDelimiterMap = parsePathToDelimite()//Obtain by parsing the <pathToDelimiter> file for(Map.Entry<String, String> entry: pathToDelimiterMap.entrySet()){ //config path String configPath = entry.getKey(); //if configPath is the prefix of filePath, set delimiter corresponding to the file path if(filePath.startsWith(configPath)) delimiter = entry.getValue(); } byte[] recordDelimiterBytes = null; if (null != delimiter) { recordDelimiterBytes = delimiter.getBytes(Charsets.UTF_8); } return new LineRecordReader(job, (FileSplit) genericSplit, recordDelimiterBytes); } } {code} *2. modify hive create table class to support <LINES TERMINATED BY>* {code:java} create table test ( id string, name string ) LINES TERMINATED BY '|@|' ; LOCATION hdfs_path; {code} If Users execute above SQL, hive will insert (hdfs_path,'|@|') to <pathToDelimiter> file. Looking forward to receiving your suggestions and feedback! *If you accept my idea, I hope you can assign the task to me. My Github account is: _lvhu-goodluck_* I really hope to contribute code to the community -- This message was sent by Atlassian Jira (v8.20.10#820010)