[GitHub] [tika] monkmachine commented on pull request #558: TIKA-1735 - Adding DWGRead parser to Tika if available

GitBox Wed, 18 May 2022 11:31:27 -0700


monkmachine commented on PR #558:
URL: https://github.com/apache/tika/pull/558#issuecomment-1130370795


   If I read byte by byte (i.e. byte[] bytes = new byte[1];) I get the correct 
result:
   
![image](https://user-images.githubusercontent.com/36521886/169118333-e9a5509e-8fb4-4b28-9be4-6d326a03059a.png)
   
   If I read with anything other than byte by byte I get added bytes/strings 
from some other part of the file:
   
![image](https://user-images.githubusercontent.com/36521886/169118508-6bd9559c-ffe9-4146-a74b-38141c585fbc.png)
   
   It's only doing it on this one json file, every time I can reproduce it 
every time on this one.
   
   
   ``` @Test
       public void jsonConvert() throws FileNotFoundException, IOException {
   
   
   
         try (FileInputStream fis = new 
FileInputStream("c:\\temp1\\dwgreadout.json");
                    FileOutputStream fos = new 
FileOutputStream("c:\\temp1\\dwgreadoutClean.json")) {
                byte[] bytes = new byte[1000];
                while (fis.read(bytes) != -1) {
                    byte[] fixedBytes = new String(bytes, 
StandardCharsets.UTF_8)
                                
                            //.replaceAll(dwgc.getCleanDwgReadRegexToReplace(), 
dwgc.getCleanDwgReadReplaceWith())
                            //.replaceAll(" nan ", " 0 ")
                            //.replaceAll(" nan,", " 0,")
                            .getBytes(StandardCharsets.UTF_8);
                    String st = new String(fixedBytes, StandardCharsets.UTF_8);
                    fos.write(fixedBytes, 0, fixedBytes.length);
                    
                    
                }
            } 
       }


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tika] monkmachine commented on pull request #558: TIKA-1735 - Adding DWGRead parser to Tika if available

Reply via email to