monkmachine commented on PR #558:
URL: https://github.com/apache/tika/pull/558#issuecomment-1130370795

   If I read byte by byte (i.e. byte[] bytes = new byte[1];) I get the correct 
result:
   
![image](https://user-images.githubusercontent.com/36521886/169118333-e9a5509e-8fb4-4b28-9be4-6d326a03059a.png)
   
   If I read with anything other than byte by byte I get added bytes/strings 
from some other part of the file:
   
![image](https://user-images.githubusercontent.com/36521886/169118508-6bd9559c-ffe9-4146-a74b-38141c585fbc.png)
   
   It's only doing it on this one json file, every time I can reproduce it 
every time on this one.
   
   
   ``` @Test
       public void jsonConvert() throws FileNotFoundException, IOException {
   
   
   
         try (FileInputStream fis = new 
FileInputStream("c:\\temp1\\dwgreadout.json");
                    FileOutputStream fos = new 
FileOutputStream("c:\\temp1\\dwgreadoutClean.json")) {
                byte[] bytes = new byte[1000];
                while (fis.read(bytes) != -1) {
                    byte[] fixedBytes = new String(bytes, 
StandardCharsets.UTF_8)
                                
                            //.replaceAll(dwgc.getCleanDwgReadRegexToReplace(), 
dwgc.getCleanDwgReadReplaceWith())
                            //.replaceAll(" nan ", " 0 ")
                            //.replaceAll(" nan,", " 0,")
                            .getBytes(StandardCharsets.UTF_8);
                    String st = new String(fixedBytes, StandardCharsets.UTF_8);
                    fos.write(fixedBytes, 0, fixedBytes.length);
                    
                    
                }
            } 
       }


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to