[ 
https://issues.apache.org/jira/browse/SOLR-10981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070510#comment-16070510
 ] 

Andrew Lundgren edited comment on SOLR-10981 at 6/30/17 6:22 PM:
-----------------------------------------------------------------

Updated code to handle .gzip as well as mixed case.  
Updated FileStream to use suffixes to determine ContentType before looking 
inside file.
Added .cvs extension support to FileStream ContentType
Modified FileStream ContentType so that when it does look inside the file it 
handles whitespaces and end of line.

FileStream and file:// URLStream now behave consistently when application/gzip, 
application/octet-stream, content/unknown are returned.



was (Author: lundgren):
Updated code to handle .gzip as well as mixed case.  
Updated FileStream to use suffixes to determine ContentType before looking 
inside file.
Added .cvs extension support to FileStream ContentType
Modified FileStream ContentType so that when it does look inside the file it 
handles whitespaces and end of line.

FileStream and file:// URLStream now behave consistently for certain mime-types.



> Allow update to load gzip files 
> --------------------------------
>
>                 Key: SOLR-10981
>                 URL: https://issues.apache.org/jira/browse/SOLR-10981
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrJ
>    Affects Versions: 6.6
>            Reporter: Andrew Lundgren
>              Labels: patch
>             Fix For: 4.10.4, 6.6, master (7.0)
>
>         Attachments: SOLR-10981.patch, SOLR-10981.patch, SOLR-10981.patch
>
>
> We currently import large CSV files.  We store them in gzip files as they 
> compress at around 80%.
> To import them we must gunzip them and then import them.  After that we no 
> longer need the decompressed files.
> This patch allows directly opening either URL, or local files that are 
> gzipped.
> For URLs, to determine if the file is gzipped, it will check the content 
> encoding=="gzip" or if the file ends in ".gz"
> For files, if the file ends in ".gz" then it will assume the file is gzipped.
> I have tested the patch with 4.10.4, 6.6.0 and master from git.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to