date:20211202

[GitHub] [orc] guiyanakuang commented on pull request #967: ORC-1053: Fix time zone offset precision when convert tool converts `LocalDateTime` to `Timestamp` is not consistent with the internal defau

2021-12-02 Thread GitBox



guiyanakuang commented on pull request #967:
URL: https://github.com/apache/orc/pull/967#issuecomment-985290716


   > +1, LGTM. Thank you, @guiyanakuang . For C++ case, could you file a new 
JIRA for that? [ORC-1053](https://issues.apache.org/jira/browse/ORC-1053) is 
enough for Java-only patch
   
   Okay, I created the ORC-1055


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (ORC-1055) [C++] Timestamp values read in Hive are different when using ORC file created using CSV to ORC converter tools

2021-12-02 Thread Yiqun Zhang (Jira)

Yiqun Zhang created ORC-1055:


 Summary: [C++] Timestamp values read in Hive are different when 
using ORC file created using CSV to ORC converter tools
 Key: ORC-1055
 URL: https://issues.apache.org/jira/browse/ORC-1055
 Project: ORC
  Issue Type: Bug
  Components: C++
Reporter: Yiqun Zhang
 Attachments: converted_by_cpp.orc, timestamp.csv

I have a CSV file that has a column having timestamp values as 0001-01-01 
00:00:00.0. Then I convert CSV file to ORC file using CSV to ORC converter and 
place the ORC file in a hive table backed by ORC files. On querying the data 
using Hive beeline and Spark SQL, different results are obtained

If converted using CPP tool, value read using Hive beeline and Spark SQL 
queries is 0001-01-03 00:00:00



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [orc] dongjoon-hyun commented on pull request #967: ORC-1053: Fix time zone offset precision when convert tool converts `LocalDateTime` to `Timestamp` is not consistent with the internal defa

2021-12-02 Thread GitBox



dongjoon-hyun commented on pull request #967:
URL: https://github.com/apache/orc/pull/967#issuecomment-985280738


   Merged to main/1.7.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [orc] dongjoon-hyun merged pull request #967: ORC-1053: Fix time zone offset precision when convert tool converts `LocalDateTime` to `Timestamp` is not consistent with the internal default pr

2021-12-02 Thread GitBox



dongjoon-hyun merged pull request #967:
URL: https://github.com/apache/orc/pull/967


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [orc] guiyanakuang commented on pull request #967: ORC-1053: Fix time zone offset precision when convert tool converts `LocalDateTime` to `Timestamp` is not consistent with the internal defau

2021-12-02 Thread GitBox



guiyanakuang commented on pull request #967:
URL: https://github.com/apache/orc/pull/967#issuecomment-984469903


   > Also, cc @wgtmac since C++ tool seems to have the same issue according to 
the JIRA.
   
   I have also debugged some of the C++ code. It looks like `strptime` can't 
convert values before 1970 on mac os.
   
https://github.com/apache/orc/blob/334bf1f2c605f38c7e75ec81d1dab93c31fc8459/tools/src/CSVFileImport.cc#L257


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [orc] guiyanakuang commented on a change in pull request #967: ORC-1053: Fix time zone offset precision when convert tool converts `LocalDateTime` to `Timestamp` is not consistent with the in

2021-12-02 Thread GitBox



guiyanakuang commented on a change in pull request #967:
URL: https://github.com/apache/orc/pull/967#discussion_r760927707



##
File path: java/tools/src/test/org/apache/orc/tools/convert/TestConvert.java
##
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.orc.tools.convert;
+
+import org.apache.commons.cli.ParseException;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.exec.vector.TimestampColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.orc.OrcFile;
+import org.apache.orc.Reader;
+import org.apache.orc.RecordReader;
+import org.apache.orc.TypeDescription;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+
+import java.io.IOException;
+import java.sql.Timestamp;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+public class TestConvert {
+
+  Path workDir = new Path(System.getProperty("test.tmp.dir"));
+  Configuration conf;
+  FileSystem fs;
+  Path testFilePath;
+
+  @BeforeEach
+  public void openFileSystem () throws Exception {
+conf = new Configuration();
+fs = FileSystem.getLocal(conf);
+fs.setWorkingDirectory(workDir);
+testFilePath = new Path("TestConvert.testConvert.orc");
+fs.delete(testFilePath, false);
+  }
+
+  @Test
+  public void testConvertCustomTimestampFromCsv() throws IOException, 
ParseException {
+Path csvFile = new Path("test.csv");
+FSDataOutputStream stream = fs.create(csvFile, true);
+String[] timeValues = new String[] {"0001-01-01 00:00:00.000", "2021-12-01 
18:36:00.800"};
+stream.writeBytes(String.join("\n", timeValues));
+stream.close();
+String schema = "struct";
+String timestampFormat = "-MM-dd HH:mm:ss.SSS";
+TypeDescription readSchema = TypeDescription.fromString(schema);
+
+ConvertTool.main(conf, new String[]{"--schema", schema, "-o", 
testFilePath.toString(),
+"-t", timestampFormat, csvFile.toString()});
+
+assertTrue(fs.exists(testFilePath));
+
+Reader reader = OrcFile.createReader(testFilePath, 
OrcFile.readerOptions(conf));
+VectorizedRowBatch batch = readSchema.createRowBatch();
+RecordReader rowIterator = 
reader.rows(reader.options().schema(readSchema));
+TimestampColumnVector tcv = (TimestampColumnVector) batch.cols[0];
+rowIterator.nextBatch(batch);
+
+while (rowIterator.nextBatch(batch)) {
+  for (int row = 0; row < batch.size; ++row) {
+Timestamp timestamp = Timestamp.valueOf(timeValues[row]);
+assertEquals(timestamp.getTime(), tcv.time[row]);
+assertEquals(timestamp.getNanos(), tcv.nanos[row]);

Review comment:
   @dongjoon-hyun I added two hooks using annotations to ensure that the 
default time zone during testing this class is New York, which feels good. Also 
I removed a redundant statement :sweat_smile:. 
   You can try again when you have time, I'm sure the old code will not pass 
the test.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [orc] guiyanakuang commented on a change in pull request #967: ORC-1053: Fix time zone offset precision when convert tool converts `LocalDateTime` to `Timestamp` is not consistent with the in

2021-12-02 Thread GitBox



guiyanakuang commented on a change in pull request #967:
URL: https://github.com/apache/orc/pull/967#discussion_r760890059



##
File path: java/tools/src/test/org/apache/orc/tools/convert/TestConvert.java
##
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.orc.tools.convert;
+
+import org.apache.commons.cli.ParseException;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.exec.vector.TimestampColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.orc.OrcFile;
+import org.apache.orc.Reader;
+import org.apache.orc.RecordReader;
+import org.apache.orc.TypeDescription;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+
+import java.io.IOException;
+import java.sql.Timestamp;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+public class TestConvert {
+
+  Path workDir = new Path(System.getProperty("test.tmp.dir"));
+  Configuration conf;
+  FileSystem fs;
+  Path testFilePath;
+
+  @BeforeEach
+  public void openFileSystem () throws Exception {
+conf = new Configuration();
+fs = FileSystem.getLocal(conf);
+fs.setWorkingDirectory(workDir);
+testFilePath = new Path("TestConvert.testConvert.orc");
+fs.delete(testFilePath, false);
+  }
+
+  @Test
+  public void testConvertCustomTimestampFromCsv() throws IOException, 
ParseException {
+Path csvFile = new Path("test.csv");
+FSDataOutputStream stream = fs.create(csvFile, true);
+String[] timeValues = new String[] {"0001-01-01 00:00:00.000", "2021-12-01 
18:36:00.800"};
+stream.writeBytes(String.join("\n", timeValues));
+stream.close();
+String schema = "struct";
+String timestampFormat = "-MM-dd HH:mm:ss.SSS";
+TypeDescription readSchema = TypeDescription.fromString(schema);
+
+ConvertTool.main(conf, new String[]{"--schema", schema, "-o", 
testFilePath.toString(),
+"-t", timestampFormat, csvFile.toString()});
+
+assertTrue(fs.exists(testFilePath));
+
+Reader reader = OrcFile.createReader(testFilePath, 
OrcFile.readerOptions(conf));
+VectorizedRowBatch batch = readSchema.createRowBatch();
+RecordReader rowIterator = 
reader.rows(reader.options().schema(readSchema));
+TimestampColumnVector tcv = (TimestampColumnVector) batch.cols[0];
+rowIterator.nextBatch(batch);
+
+while (rowIterator.nextBatch(batch)) {
+  for (int row = 0; row < batch.size; ++row) {
+Timestamp timestamp = Timestamp.valueOf(timeValues[row]);
+assertEquals(timestamp.getTime(), tcv.time[row]);
+assertEquals(timestamp.getNanos(), tcv.nanos[row]);

Review comment:
   Maybe I should specify the time zone to test, but there are many methods 
inside ORC to get the local default time zone, so it's not good to cover it 
uniformly at the moment, I need to think.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [orc] dongjoon-hyun commented on a change in pull request #967: ORC-1053: Fix time zone offset precision when convert tool converts `LocalDateTime` to `Timestamp` is not consistent with the i

2021-12-02 Thread GitBox



dongjoon-hyun commented on a change in pull request #967:
URL: https://github.com/apache/orc/pull/967#discussion_r760887860



##
File path: java/tools/src/test/org/apache/orc/tools/convert/TestConvert.java
##
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.orc.tools.convert;
+
+import org.apache.commons.cli.ParseException;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.exec.vector.TimestampColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.orc.OrcFile;
+import org.apache.orc.Reader;
+import org.apache.orc.RecordReader;
+import org.apache.orc.TypeDescription;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+
+import java.io.IOException;
+import java.sql.Timestamp;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+public class TestConvert {
+
+  Path workDir = new Path(System.getProperty("test.tmp.dir"));
+  Configuration conf;
+  FileSystem fs;
+  Path testFilePath;
+
+  @BeforeEach
+  public void openFileSystem () throws Exception {
+conf = new Configuration();
+fs = FileSystem.getLocal(conf);
+fs.setWorkingDirectory(workDir);
+testFilePath = new Path("TestConvert.testConvert.orc");
+fs.delete(testFilePath, false);
+  }
+
+  @Test
+  public void testConvertCustomTimestampFromCsv() throws IOException, 
ParseException {
+Path csvFile = new Path("test.csv");
+FSDataOutputStream stream = fs.create(csvFile, true);
+String[] timeValues = new String[] {"0001-01-01 00:00:00.000", "2021-12-01 
18:36:00.800"};
+stream.writeBytes(String.join("\n", timeValues));
+stream.close();
+String schema = "struct";
+String timestampFormat = "-MM-dd HH:mm:ss.SSS";
+TypeDescription readSchema = TypeDescription.fromString(schema);
+
+ConvertTool.main(conf, new String[]{"--schema", schema, "-o", 
testFilePath.toString(),
+"-t", timestampFormat, csvFile.toString()});
+
+assertTrue(fs.exists(testFilePath));
+
+Reader reader = OrcFile.createReader(testFilePath, 
OrcFile.readerOptions(conf));
+VectorizedRowBatch batch = readSchema.createRowBatch();
+RecordReader rowIterator = 
reader.rows(reader.options().schema(readSchema));
+TimestampColumnVector tcv = (TimestampColumnVector) batch.cols[0];
+rowIterator.nextBatch(batch);
+
+while (rowIterator.nextBatch(batch)) {
+  for (int row = 0; row < batch.size; ++row) {
+Timestamp timestamp = Timestamp.valueOf(timeValues[row]);
+assertEquals(timestamp.getTime(), tcv.time[row]);
+assertEquals(timestamp.getNanos(), tcv.nanos[row]);

Review comment:
   Can we make the test case to be independent from the tester's 
environment?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [orc] dongjoon-hyun commented on a change in pull request #967: ORC-1053: Fix time zone offset precision when convert tool converts `LocalDateTime` to `Timestamp` is not consistent with the i

2021-12-02 Thread GitBox



dongjoon-hyun commented on a change in pull request #967:
URL: https://github.com/apache/orc/pull/967#discussion_r760887314



##
File path: java/tools/src/test/org/apache/orc/tools/convert/TestConvert.java
##
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.orc.tools.convert;
+
+import org.apache.commons.cli.ParseException;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.exec.vector.TimestampColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.orc.OrcFile;
+import org.apache.orc.Reader;
+import org.apache.orc.RecordReader;
+import org.apache.orc.TypeDescription;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+
+import java.io.IOException;
+import java.sql.Timestamp;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+public class TestConvert {
+
+  Path workDir = new Path(System.getProperty("test.tmp.dir"));
+  Configuration conf;
+  FileSystem fs;
+  Path testFilePath;
+
+  @BeforeEach
+  public void openFileSystem () throws Exception {
+conf = new Configuration();
+fs = FileSystem.getLocal(conf);
+fs.setWorkingDirectory(workDir);
+testFilePath = new Path("TestConvert.testConvert.orc");
+fs.delete(testFilePath, false);
+  }
+
+  @Test
+  public void testConvertCustomTimestampFromCsv() throws IOException, 
ParseException {
+Path csvFile = new Path("test.csv");
+FSDataOutputStream stream = fs.create(csvFile, true);
+String[] timeValues = new String[] {"0001-01-01 00:00:00.000", "2021-12-01 
18:36:00.800"};
+stream.writeBytes(String.join("\n", timeValues));
+stream.close();
+String schema = "struct";
+String timestampFormat = "-MM-dd HH:mm:ss.SSS";
+TypeDescription readSchema = TypeDescription.fromString(schema);
+
+ConvertTool.main(conf, new String[]{"--schema", schema, "-o", 
testFilePath.toString(),
+"-t", timestampFormat, csvFile.toString()});
+
+assertTrue(fs.exists(testFilePath));
+
+Reader reader = OrcFile.createReader(testFilePath, 
OrcFile.readerOptions(conf));
+VectorizedRowBatch batch = readSchema.createRowBatch();
+RecordReader rowIterator = 
reader.rows(reader.options().schema(readSchema));
+TimestampColumnVector tcv = (TimestampColumnVector) batch.cols[0];
+rowIterator.nextBatch(batch);
+
+while (rowIterator.nextBatch(batch)) {
+  for (int row = 0; row < batch.size; ++row) {
+Timestamp timestamp = Timestamp.valueOf(timeValues[row]);
+assertEquals(timestamp.getTime(), tcv.time[row]);
+assertEquals(timestamp.getNanos(), tcv.nanos[row]);

Review comment:
   The difference between you and me seems to be the timezones. You are in 
Asia timezone and I'm in America/Los Angeles timezone.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [orc] dongjoon-hyun commented on a change in pull request #967: ORC-1053: Fix time zone offset precision when convert tool converts `LocalDateTime` to `Timestamp` is not consistent with the i

2021-12-02 Thread GitBox



dongjoon-hyun commented on a change in pull request #967:
URL: https://github.com/apache/orc/pull/967#discussion_r760886324



##
File path: java/tools/src/test/org/apache/orc/tools/convert/TestConvert.java
##
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.orc.tools.convert;
+
+import org.apache.commons.cli.ParseException;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.exec.vector.TimestampColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.orc.OrcFile;
+import org.apache.orc.Reader;
+import org.apache.orc.RecordReader;
+import org.apache.orc.TypeDescription;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+
+import java.io.IOException;
+import java.sql.Timestamp;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+public class TestConvert {
+
+  Path workDir = new Path(System.getProperty("test.tmp.dir"));
+  Configuration conf;
+  FileSystem fs;
+  Path testFilePath;
+
+  @BeforeEach
+  public void openFileSystem () throws Exception {
+conf = new Configuration();
+fs = FileSystem.getLocal(conf);
+fs.setWorkingDirectory(workDir);
+testFilePath = new Path("TestConvert.testConvert.orc");
+fs.delete(testFilePath, false);
+  }
+
+  @Test
+  public void testConvertCustomTimestampFromCsv() throws IOException, 
ParseException {
+Path csvFile = new Path("test.csv");
+FSDataOutputStream stream = fs.create(csvFile, true);
+String[] timeValues = new String[] {"0001-01-01 00:00:00.000", "2021-12-01 
18:36:00.800"};
+stream.writeBytes(String.join("\n", timeValues));
+stream.close();
+String schema = "struct";
+String timestampFormat = "-MM-dd HH:mm:ss.SSS";
+TypeDescription readSchema = TypeDescription.fromString(schema);
+
+ConvertTool.main(conf, new String[]{"--schema", schema, "-o", 
testFilePath.toString(),
+"-t", timestampFormat, csvFile.toString()});
+
+assertTrue(fs.exists(testFilePath));
+
+Reader reader = OrcFile.createReader(testFilePath, 
OrcFile.readerOptions(conf));
+VectorizedRowBatch batch = readSchema.createRowBatch();
+RecordReader rowIterator = 
reader.rows(reader.options().schema(readSchema));
+TimestampColumnVector tcv = (TimestampColumnVector) batch.cols[0];
+rowIterator.nextBatch(batch);
+
+while (rowIterator.nextBatch(batch)) {
+  for (int row = 0; row < batch.size; ++row) {
+Timestamp timestamp = Timestamp.valueOf(timeValues[row]);
+assertEquals(timestamp.getTime(), tcv.time[row]);
+assertEquals(timestamp.getNanos(), tcv.nanos[row]);

Review comment:
   I ran this yesterday and today. It passed on my laptop without your 
patch.
   ```
   $ git diff main --stat
java/tools/src/test/org/apache/orc/tools/convert/TestConvert.java | 89 
+
1 file changed, 89 insertions(+)
   
   $ mvn package -pl tools -Dtest=org.apache.orc.tools.convert.TestConvert
   ...
   [INFO] ---
   [INFO]  T E S T S
   [INFO] ---
   [INFO] Running org.apache.orc.tools.convert.TestConvert
   Processing test.csv
   [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.337 
s - in org.apache.orc.tools.convert.TestConvert
   [INFO]
   [INFO] Results:
   [INFO]
   [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
   ...
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [orc] guiyanakuang commented on pull request #967: ORC-1053: Fix time zone offset precision when convert tool converts `LocalDateTime` to `Timestamp` is not consistent with the internal defau

[jira] [Created] (ORC-1055) [C++] Timestamp values read in Hive are different when using ORC file created using CSV to ORC converter tools

[GitHub] [orc] dongjoon-hyun commented on pull request #967: ORC-1053: Fix time zone offset precision when convert tool converts `LocalDateTime` to `Timestamp` is not consistent with the internal defa

[GitHub] [orc] dongjoon-hyun merged pull request #967: ORC-1053: Fix time zone offset precision when convert tool converts `LocalDateTime` to `Timestamp` is not consistent with the internal default pr

[GitHub] [orc] guiyanakuang commented on pull request #967: ORC-1053: Fix time zone offset precision when convert tool converts `LocalDateTime` to `Timestamp` is not consistent with the internal defau

[GitHub] [orc] guiyanakuang commented on a change in pull request #967: ORC-1053: Fix time zone offset precision when convert tool converts `LocalDateTime` to `Timestamp` is not consistent with the in

[GitHub] [orc] guiyanakuang commented on a change in pull request #967: ORC-1053: Fix time zone offset precision when convert tool converts `LocalDateTime` to `Timestamp` is not consistent with the in

[GitHub] [orc] dongjoon-hyun commented on a change in pull request #967: ORC-1053: Fix time zone offset precision when convert tool converts `LocalDateTime` to `Timestamp` is not consistent with the i

[GitHub] [orc] dongjoon-hyun commented on a change in pull request #967: ORC-1053: Fix time zone offset precision when convert tool converts `LocalDateTime` to `Timestamp` is not consistent with the i

[GitHub] [orc] dongjoon-hyun commented on a change in pull request #967: ORC-1053: Fix time zone offset precision when convert tool converts `LocalDateTime` to `Timestamp` is not consistent with the i

10 matches

Site Navigation

Mail list logo

Footer information