[ 
https://issues.apache.org/jira/browse/TIKA-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209812#comment-13209812
 ] 

Richard Yu commented on TIKA-862:
---------------------------------

The one I sent earlier do not pass the h5dump test.  It also do not pass the 
Tika test (i.e. Just showed 4 lines)
I deleted the file from my test smaples and here are the rest that I keep:
[ryu@localhost hdf5]$ ls
IICMO_npp_d20120119_t1301328_e1302569_b01180_c20120119195316463240_noaa_ops.h5
RNSCA_npp_d20111121_t1935200_e1935400_b00346_c20111122203300301515_noaa_ops.h5
SVM13_npp_d20120122_t1659139_e1700381_b01225_c20120123000312144174_noaa_ops.h5
VSTYO_npp_d20120120_t0617066_e0618308_b01190_c20120120123536501739_noaa_ops.hdf5

[ryu@localhost hdf5]$ java -jar /usr/local/extractors/tika-app-1.0.jar -m 
IICMO_npp_d20120119_t1301328_e1302569_b01180_c20120119195316463240_noaa_ops.h5 
Content-Encoding: windows-1252
Content-Length: 14800864
Content-Type: text/plain
resourceName: 
IICMO_npp_d20120119_t1301328_e1302569_b01180_c20120119195316463240_noaa_ops.h5

[ryu@localhost hdf5]$ java -jar /usr/local/extractors/tika-app-1.0.jar -m 
RNSCA_npp_d20111121_t1935200_e1935400_b00346_c20111122203300301515_noaa_ops.h5 
Content-Encoding: windows-1252
Content-Length: 20888
Content-Type: text/plain
resourceName: 
RNSCA_npp_d20111121_t1935200_e1935400_b00346_c20111122203300301515_noaa_ops.h5

[ryu@localhost hdf5]$ java -jar /usr/local/extractors/tika-app-1.0.jar -m 
SVM13_npp_d20120122_t1659139_e1700381_b01225_c20120123000312144174_noaa_ops.h5 
Content-Encoding: windows-1252
Content-Length: 22187952
Content-Type: text/plain
resourceName: 
SVM13_npp_d20120122_t1659139_e1700381_b01225_c20120123000312144174_noaa_ops.h5

[ryu@localhost hdf5]$ java -jar /usr/local/extractors/tika-app-1.0.jar -m 
VSTYO_npp_d20120120_t0617066_e0618308_b01190_c20120120123536501739_noaa_ops.hdf5
 
Content-Encoding: windows-1252
Content-Length: 12328128
Content-Type: text/plain
resourceName: 
VSTYO_npp_d20120120_t0617066_e0618308_b01190_c20120120123536501739_noaa_ops.hdf5


All of them works with h5dump.  All of them are huge file except RNSCA....


I would download more smaller file and test it aginst Tika/h5dump.  Not sure 
this information help you?  Let me know.  Thanks!

Richard




                
> JPSS HDF5 files not being detected appropriately
> ------------------------------------------------
>
>                 Key: TIKA-862
>                 URL: https://issues.apache.org/jira/browse/TIKA-862
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>            Reporter: Richard Yu
>            Assignee: Chris A. Mattmann
>         Attachments: 
> RNSCA-ROLPS_npp_d20120202_t1841338_e1842112_b01382_c20120202203730692328_noaa_ops.h5,
>  
> RNSCA-ROLPS_npp_d20120202_t1841338_e1842112_b01382_c20120202203730692328_noaa_ops.h5,
>  
> RNSCA_npp_d20111121_t1935200_e1935400_b00346_c20111122203300301515_noaa_ops.h5
>
>
> As commented in TIKA-614, JPSS HDF 5 files are not being properly detected by 
> Tika. See this:
> from [~minfing]:
> {quote}
> We were trying to extract metadata from our h5 file (i.e. with JPSS 
> extension). We ran the following command line:
> {noformat}
> [ryu@localhost hdf5extractor]$ java -jar tika-app-1.0.jar -m \
> > /usr/local/staging/products/h5/SVM13_npp_d20120122_t1659139_e1700381_b01225_c20120123000312144174_noaa_ops.h5
> Content-Encoding: windows-1252
> Content-Length: 22187952
> Content-Type: text/plain
> resourceName: 
> SVM13_npp_d20120122_t1659139_e1700381_b01225_c20120123000312144174_noaa_ops.h5
> [ryu@localhost hdf5extractor]$
> {noformat}
> We noticed that the content type in text/plain and only 4 lines of output 
> (i.e. we expected al lots of metadata).
> Let me know if more information is needed. Thanks!
> Richard
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to