Re: [basex-talk] get and extract .gz files from web

2016-01-27 Thread Christian Grün
Hi Andy,

>> Expecting to see "example.json"
>> Could this be a bug?

Indeed, GZIP may also contain filenames (2.3.1, [1]), but the relevant
bytes are ignored by the Java standard library [2]. It would be
possible to copy and update the class; let’s see.

And hi Marc,

> I also think that this could be a bug or at least a good improvement
> to make as the docs say gzip archives can be created. Christian, you
> think we should file an issue for this?

What do you exactly mean by that? Do you also refer to Andy’s
observation that the filename is not included in the returned archive
description?

Christian

[1] http://www.ietf.org/rfc/rfc1952.txt
[2] http://www.docjar.com/html/api/java/util/zip/GZIPInputStream.java.html#181


>
> --Marc
>
> On Tue, Jan 26, 2016 at 9:51 PM, Andy Bunce  wrote:
>> Hi Marco,
>>
>> I get the same. This works:
>>
>> 
>> "https://wiki.mozilla.org/images/f/ff/Example.json.gz";
>> !fetch:binary(.)
>> !archive:extract-text(.)
>> 
>>
>> But this returns empty:
>>
>> 
>> "https://wiki.mozilla.org/images/f/ff/Example.json.gz";
>> !fetch:binary(.)
>> !archive:entries(.)
>>
>> http://basex.org/modules/archive"/>
>> 
>>
>> Expecting to see "example.json"
>> Could this be a bug?
>>
>> /Andy
>>
>>
>>
>> On 26 January 2016 at 18:51, Maximilian Gärber  wrote:
>>>
>>> Hi,
>>>
>>> I think this should work, I use it for OData requests from IIS.
>>>
>>> Need to dig through the source...but I used one oft the extract-binary
>>> functions
>>>
>>> Regards, Max
>>>
>>> Am 26.01.2016 16:04 schrieb "Marc van Grootel"
>>> :

 Well, shelling out wasn't so hard even on Windows with cygwin tools it's
 simply

 proc:execute('gunzip', $path-to-gzipped-file)

 Worked quite transparently as it extracts the files and removes the
 .gz file. Would be nice if there's a pure XQuery solution but for now
 I'm okay.

 Cheers,

 On Tue, Jan 26, 2016 at 3:13 PM, Marc van Grootel
  wrote:
 > Hi,
 >
 > I hoped that I could use archive module to also extract gzipped files.
 > I need to fetch/sync large XML from a web service that has the option
 > of getting files with gzip encoding (to be nice to the web server).
 >
 > First attempt was to explicitly get the gz file via the URL and then
 > treat it like an archive binary (extracting it with the recipe from
 > the archive module page). The entries XML I get is empty so I suppose
 > that I cannot read .gz
 >
 > Second attempt was to specify Accept-Encoding = gzip which indeed
 > delivers the XML as a binary. But I probably run into the same issue
 > when trying to extract.
 >
 > Is there a way to do the extraction of .gz encoded files without
 > having to shell out to some kind of unzipper?
 >
 > Cheers,
 > --Marc



 --
 --Marc
>>
>>
>
>
>
> --
> --Marc


Re: [basex-talk] get and extract .gz files from web

2016-01-27 Thread Marc van Grootel
Hi Andy,

Nice use of syntax (though you have to loose the semi-colon of course).

Visually i like the arrow operator a lot. Looks like a visual pipeline

"https://wiki.mozilla.org/images/f/ff/Example.json.gz";
=> fetch:binary()
=> archive:extract-text()

I also think that this could be a bug or at least a good improvement
to make as the docs say gzip archives can be created. Christian, you
think we should file an issue for this?

--Marc

On Tue, Jan 26, 2016 at 9:51 PM, Andy Bunce  wrote:
> Hi Marco,
>
> I get the same. This works:
>
> 
> "https://wiki.mozilla.org/images/f/ff/Example.json.gz";
> !fetch:binary(.)
> !archive:extract-text(.)
> 
>
> But this returns empty:
>
> 
> "https://wiki.mozilla.org/images/f/ff/Example.json.gz";
> !fetch:binary(.)
> !archive:entries(.)
>
> http://basex.org/modules/archive"/>
> 
>
> Expecting to see "example.json"
>
> Could this be a bug?
>
> /Andy
>
>
>
> On 26 January 2016 at 18:51, Maximilian Gärber  wrote:
>>
>> Hi,
>>
>> I think this should work, I use it for OData requests from IIS.
>>
>> Need to dig through the source...but I used one oft the extract-binary
>> functions
>>
>> Regards, Max
>>
>> Am 26.01.2016 16:04 schrieb "Marc van Grootel"
>> :
>>>
>>> Well, shelling out wasn't so hard even on Windows with cygwin tools it's
>>> simply
>>>
>>> proc:execute('gunzip', $path-to-gzipped-file)
>>>
>>> Worked quite transparently as it extracts the files and removes the
>>> .gz file. Would be nice if there's a pure XQuery solution but for now
>>> I'm okay.
>>>
>>> Cheers,
>>>
>>> On Tue, Jan 26, 2016 at 3:13 PM, Marc van Grootel
>>>  wrote:
>>> > Hi,
>>> >
>>> > I hoped that I could use archive module to also extract gzipped files.
>>> > I need to fetch/sync large XML from a web service that has the option
>>> > of getting files with gzip encoding (to be nice to the web server).
>>> >
>>> > First attempt was to explicitly get the gz file via the URL and then
>>> > treat it like an archive binary (extracting it with the recipe from
>>> > the archive module page). The entries XML I get is empty so I suppose
>>> > that I cannot read .gz
>>> >
>>> > Second attempt was to specify Accept-Encoding = gzip which indeed
>>> > delivers the XML as a binary. But I probably run into the same issue
>>> > when trying to extract.
>>> >
>>> > Is there a way to do the extraction of .gz encoded files without
>>> > having to shell out to some kind of unzipper?
>>> >
>>> > Cheers,
>>> > --Marc
>>>
>>>
>>>
>>> --
>>> --Marc
>
>



-- 
--Marc


Re: [basex-talk] get and extract .gz files from web

2016-01-26 Thread Andy Bunce
Hi Marco,

I get the same. This works:


"https://wiki.mozilla.org/images/f/ff/Example.json.gz";
!fetch:binary(.)
!archive:extract-text(.)


But this returns empty:


"https://wiki.mozilla.org/images/f/ff/Example.json.gz";
!fetch:binary(.)
!archive:entries(.)

http://basex.org/modules/archive"/>


Expecting to see "example.json"

Could this be a bug?

/Andy



On 26 January 2016 at 18:51, Maximilian Gärber  wrote:

> Hi,
>
> I think this should work, I use it for OData requests from IIS.
>
> Need to dig through the source...but I used one oft the extract-binary
> functions
>
> Regards, Max
> Am 26.01.2016 16:04 schrieb "Marc van Grootel"  >:
>
>> Well, shelling out wasn't so hard even on Windows with cygwin tools it's
>> simply
>>
>> proc:execute('gunzip', $path-to-gzipped-file)
>>
>> Worked quite transparently as it extracts the files and removes the
>> .gz file. Would be nice if there's a pure XQuery solution but for now
>> I'm okay.
>>
>> Cheers,
>>
>> On Tue, Jan 26, 2016 at 3:13 PM, Marc van Grootel
>>  wrote:
>> > Hi,
>> >
>> > I hoped that I could use archive module to also extract gzipped files.
>> > I need to fetch/sync large XML from a web service that has the option
>> > of getting files with gzip encoding (to be nice to the web server).
>> >
>> > First attempt was to explicitly get the gz file via the URL and then
>> > treat it like an archive binary (extracting it with the recipe from
>> > the archive module page). The entries XML I get is empty so I suppose
>> > that I cannot read .gz
>> >
>> > Second attempt was to specify Accept-Encoding = gzip which indeed
>> > delivers the XML as a binary. But I probably run into the same issue
>> > when trying to extract.
>> >
>> > Is there a way to do the extraction of .gz encoded files without
>> > having to shell out to some kind of unzipper?
>> >
>> > Cheers,
>> > --Marc
>>
>>
>>
>> --
>> --Marc
>>
>


Re: [basex-talk] get and extract .gz files from web

2016-01-26 Thread Maximilian Gärber
Hi,

I think this should work, I use it for OData requests from IIS.

Need to dig through the source...but I used one oft the extract-binary
functions

Regards, Max
Am 26.01.2016 16:04 schrieb "Marc van Grootel" :

> Well, shelling out wasn't so hard even on Windows with cygwin tools it's
> simply
>
> proc:execute('gunzip', $path-to-gzipped-file)
>
> Worked quite transparently as it extracts the files and removes the
> .gz file. Would be nice if there's a pure XQuery solution but for now
> I'm okay.
>
> Cheers,
>
> On Tue, Jan 26, 2016 at 3:13 PM, Marc van Grootel
>  wrote:
> > Hi,
> >
> > I hoped that I could use archive module to also extract gzipped files.
> > I need to fetch/sync large XML from a web service that has the option
> > of getting files with gzip encoding (to be nice to the web server).
> >
> > First attempt was to explicitly get the gz file via the URL and then
> > treat it like an archive binary (extracting it with the recipe from
> > the archive module page). The entries XML I get is empty so I suppose
> > that I cannot read .gz
> >
> > Second attempt was to specify Accept-Encoding = gzip which indeed
> > delivers the XML as a binary. But I probably run into the same issue
> > when trying to extract.
> >
> > Is there a way to do the extraction of .gz encoded files without
> > having to shell out to some kind of unzipper?
> >
> > Cheers,
> > --Marc
>
>
>
> --
> --Marc
>


Re: [basex-talk] get and extract .gz files from web

2016-01-26 Thread Marc van Grootel
Well, shelling out wasn't so hard even on Windows with cygwin tools it's simply

proc:execute('gunzip', $path-to-gzipped-file)

Worked quite transparently as it extracts the files and removes the
.gz file. Would be nice if there's a pure XQuery solution but for now
I'm okay.

Cheers,

On Tue, Jan 26, 2016 at 3:13 PM, Marc van Grootel
 wrote:
> Hi,
>
> I hoped that I could use archive module to also extract gzipped files.
> I need to fetch/sync large XML from a web service that has the option
> of getting files with gzip encoding (to be nice to the web server).
>
> First attempt was to explicitly get the gz file via the URL and then
> treat it like an archive binary (extracting it with the recipe from
> the archive module page). The entries XML I get is empty so I suppose
> that I cannot read .gz
>
> Second attempt was to specify Accept-Encoding = gzip which indeed
> delivers the XML as a binary. But I probably run into the same issue
> when trying to extract.
>
> Is there a way to do the extraction of .gz encoded files without
> having to shell out to some kind of unzipper?
>
> Cheers,
> --Marc



-- 
--Marc


[basex-talk] get and extract .gz files from web

2016-01-26 Thread Marc van Grootel
Hi,

I hoped that I could use archive module to also extract gzipped files.
I need to fetch/sync large XML from a web service that has the option
of getting files with gzip encoding (to be nice to the web server).

First attempt was to explicitly get the gz file via the URL and then
treat it like an archive binary (extracting it with the recipe from
the archive module page). The entries XML I get is empty so I suppose
that I cannot read .gz

Second attempt was to specify Accept-Encoding = gzip which indeed
delivers the XML as a binary. But I probably run into the same issue
when trying to extract.

Is there a way to do the extraction of .gz encoded files without
having to shell out to some kind of unzipper?

Cheers,
--Marc