Re: Processing a gzipped csv-file by line-by-line

2017-05-11 Thread Laeeth Isharc via Digitalmars-d-learn

On Friday, 12 May 2017 at 00:18:47 UTC, H. S. Teoh wrote:
On Wed, May 10, 2017 at 11:40:08PM +, Jesse Phillips via 
Digitalmars-d-learn wrote: [...]
H.S. Teoh mentioned fastcsv but requires all the data to be in 
memory.


Or you could use std.mmfile.  But if it's decompressed data, 
then it would still need to be small enough to fit in memory.  
Well, in theory you *could* use an anonymous mapping for 
std.mmfile as an OS-backed virtual memory buffer to decompress 
into, but it's questionable whether that's really worth the 
effort.



If you can get the zip to decompress into a range of dchar 
then std.csv will work with it. It is by far not the fastest, 
but much speed is lost since it supports input ranges and 
doesn't specialize on any other range type.


I actually spent some time today to look into whether fastcsv 
can possibly be made to work with general input ranges as long 
as they support slicing... and immediately ran into the 
infamous autodecoding issue: strings are not random-access 
ranges because of autodecoding, so it would require either 
extensive code surgery to make it work, or ugly hacks to bypass 
autodecoding.  I'm quite tempted to attempt the latter, in 
fact, but not now since it's getting busier at work and I don't 
have that much free time to spend on a major refactoring of 
fastcsv.


Alternatively, I could possibly hack together a version of 
fastcsv that took a range of const(char)[] as input (rather 
than a single string), so that, in theory, it could handle 
arbitrarily large input files as long as the caller can provide 
a range of data blocks, e.g., File.byChunk, or in this 
particular case, a range of decompressed data blocks from 
whatever decompressor is used to extract the data.  As long as 
you consume the individual rows without storing references to 
them indefinitely (don't try to make an array of the entire 
dataset), fastcsv's optimizations should still work, since 
unreferenced blocks will eventually get cleaned up by the GC 
when memory runs low.



T


I hacked your code to work with std.experimental.allocator.  If I 
remember it was a fair bit faster for my use.  Let me know if you 
would like me to tidy up into a pull request.


Thanks for the library.

Also - sent you an email.  Not sure if you got it.


Laeeth




Re: Processing a gzipped csv-file by line-by-line

2017-05-11 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, May 10, 2017 at 11:40:08PM +, Jesse Phillips via 
Digitalmars-d-learn wrote:
[...]
> H.S. Teoh mentioned fastcsv but requires all the data to be in memory.

Or you could use std.mmfile.  But if it's decompressed data, then it
would still need to be small enough to fit in memory.  Well, in theory
you *could* use an anonymous mapping for std.mmfile as an OS-backed
virtual memory buffer to decompress into, but it's questionable whether
that's really worth the effort.


> If you can get the zip to decompress into a range of dchar then
> std.csv will work with it. It is by far not the fastest, but much
> speed is lost since it supports input ranges and doesn't specialize on
> any other range type.

I actually spent some time today to look into whether fastcsv can
possibly be made to work with general input ranges as long as they
support slicing... and immediately ran into the infamous autodecoding
issue: strings are not random-access ranges because of autodecoding, so
it would require either extensive code surgery to make it work, or ugly
hacks to bypass autodecoding.  I'm quite tempted to attempt the latter,
in fact, but not now since it's getting busier at work and I don't have
that much free time to spend on a major refactoring of fastcsv.

Alternatively, I could possibly hack together a version of fastcsv that
took a range of const(char)[] as input (rather than a single string), so
that, in theory, it could handle arbitrarily large input files as long
as the caller can provide a range of data blocks, e.g., File.byChunk, or
in this particular case, a range of decompressed data blocks from
whatever decompressor is used to extract the data.  As long as you
consume the individual rows without storing references to them
indefinitely (don't try to make an array of the entire dataset),
fastcsv's optimizations should still work, since unreferenced blocks
will eventually get cleaned up by the GC when memory runs low.


T

-- 
The computer is only a tool. Unfortunately, so is the user. -- Armaphine, K5


Re: Unicode Bidi Brackets in D std library?

2017-05-11 Thread Jonathan M Davis via Digitalmars-d-learn
On Thursday, May 11, 2017 7:07:30 PM PDT Las via Digitalmars-d-learn wrote:
> On Thursday, 11 May 2017 at 19:05:46 UTC, Las wrote:
> > On Thursday, 11 May 2017 at 18:59:12 UTC, ag0aep6g wrote:
> >> On 05/11/2017 08:27 PM, Las wrote:
> >>> I see no way of getting
> >>> [these](http://unicode.org/Public/UCD/latest/ucd/BidiBrackets.txt)
> >>> properties for unicode code points in the std.uni library.
> >>> How do I get
> >>> these properties?
> >>
> >> Looks like it's too new. std.uni references "Unicode v6.2" as
> >> the standard it complies with, but that BidiBrackets.txt was
> >> "originally created [...] for Unicode 6.3".
> >
> > That's sad.
> > Maybe there's an easy way for me to add it to phobos.
>
> Nearly ten thousand lines in std.uni, great.

Well, Unicode _is_ stupidly complicated. However, also remember that those
lines include the unit tests and documentation, so it's not as much code as
it might first seem like.

- Jonathan M Davis



Re: How to avoid throwing an exceptions for a built-in function?

2017-05-11 Thread Jordan Wilson via Digitalmars-d-learn

On Thursday, 11 May 2017 at 18:07:47 UTC, H. S. Teoh wrote:
On Thu, May 11, 2017 at 05:55:03PM +, k-five via 
Digitalmars-d-learn wrote:

On Thursday, 11 May 2017 at 17:18:37 UTC, crimaniak wrote:
> On Wednesday, 10 May 2017 at 12:40:41 UTC, k-five wrote:
-
> try this: 
> https://dlang.org/phobos/std_exception.html#ifThrown




Worked. Thanks.

import std.stdio;
import std.conv: to;
import std.exception: ifThrown;

void main( string[] args ){

string str = "string";
	int index = to!int( str ).ifThrown( 0 ); // if an exception 
was thrown, it

is ignored and then return ( 0 );
writeln( "index: ", index );  // 0
}


Keep in mind, though, that you should not do this in an inner 
loop if you care about performance, as throwing / catching 
exceptions will incur a performance hit.  Outside of inner 
loops, though, it probably doesn't matter.



T


This reason is why I sometimes use isNumeric if I have heaps of 
strings I need to convert,  to reduce exceptions. So something 
like:

int index = (str.isNumeric) ? to!int(str).ifThrown(0) : 0;

Jordan


Re: Unicode Bidi Brackets in D std library?

2017-05-11 Thread Las via Digitalmars-d-learn

On Thursday, 11 May 2017 at 19:05:46 UTC, Las wrote:

On Thursday, 11 May 2017 at 18:59:12 UTC, ag0aep6g wrote:

On 05/11/2017 08:27 PM, Las wrote:

I see no way of getting
[these](http://unicode.org/Public/UCD/latest/ucd/BidiBrackets.txt)
properties for unicode code points in the std.uni library. 
How do I get

these properties?


Looks like it's too new. std.uni references "Unicode v6.2" as 
the standard it complies with, but that BidiBrackets.txt was 
"originally created [...] for Unicode 6.3".


That's sad.
Maybe there's an easy way for me to add it to phobos.


Nearly ten thousand lines in std.uni, great.


Re: Unicode Bidi Brackets in D std library?

2017-05-11 Thread Las via Digitalmars-d-learn

On Thursday, 11 May 2017 at 18:59:12 UTC, ag0aep6g wrote:

On 05/11/2017 08:27 PM, Las wrote:

I see no way of getting
[these](http://unicode.org/Public/UCD/latest/ucd/BidiBrackets.txt)
properties for unicode code points in the std.uni library. How 
do I get

these properties?


Looks like it's too new. std.uni references "Unicode v6.2" as 
the standard it complies with, but that BidiBrackets.txt was 
"originally created [...] for Unicode 6.3".


That's sad.
Maybe there's an easy way for me to add it to phobos.


Re: Unicode Bidi Brackets in D std library?

2017-05-11 Thread ag0aep6g via Digitalmars-d-learn

On 05/11/2017 08:27 PM, Las wrote:

I see no way of getting
[these](http://unicode.org/Public/UCD/latest/ucd/BidiBrackets.txt)
properties for unicode code points in the std.uni library. How do I get
these properties?


Looks like it's too new. std.uni references "Unicode v6.2" as the 
standard it complies with, but that BidiBrackets.txt was "originally 
created [...] for Unicode 6.3".


Re: Lookahead in unittest

2017-05-11 Thread Jacob Carlborg via Digitalmars-d-learn

On 2017-05-10 18:17, Stefan Koch wrote:


It looks like this unitest-test block are treated like a function.


unittest blocks are lowered to functions.

--
/Jacob Carlborg


Unicode Bidi Brackets in D std library?

2017-05-11 Thread Las via Digitalmars-d-learn
I see no way of getting 
[these](http://unicode.org/Public/UCD/latest/ucd/BidiBrackets.txt) properties for unicode code points in the std.uni library. How do I get these properties?


Re: How to avoid throwing an exceptions for a built-in function?

2017-05-11 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, May 11, 2017 at 05:55:03PM +, k-five via Digitalmars-d-learn wrote:
> On Thursday, 11 May 2017 at 17:18:37 UTC, crimaniak wrote:
> > On Wednesday, 10 May 2017 at 12:40:41 UTC, k-five wrote:
> -
> > try this:
> > https://dlang.org/phobos/std_exception.html#ifThrown
> 
> 
> 
> Worked. Thanks.
> 
> import std.stdio;
> import std.conv: to;
> import std.exception: ifThrown;
> 
> void main( string[] args ){
>   
>   string str = "string";
>   int index = to!int( str ).ifThrown( 0 ); // if an exception was thrown, 
> it
> is ignored and then return ( 0 );
>   writeln( "index: ", index );// 0
> }

Keep in mind, though, that you should not do this in an inner loop if
you care about performance, as throwing / catching exceptions will incur
a performance hit.  Outside of inner loops, though, it probably doesn't
matter.


T

-- 
Ph.D. = Permanent head Damage


Re: How to avoid throwing an exceptions for a built-in function?

2017-05-11 Thread k-five via Digitalmars-d-learn

On Thursday, 11 May 2017 at 17:18:37 UTC, crimaniak wrote:

On Wednesday, 10 May 2017 at 12:40:41 UTC, k-five wrote:

-

try this:
https://dlang.org/phobos/std_exception.html#ifThrown




Worked. Thanks.

import std.stdio;
import std.conv: to;
import std.exception: ifThrown;

void main( string[] args ){

string str = "string";
	int index = to!int( str ).ifThrown( 0 ); // if an exception was 
thrown, it is ignored and then return ( 0 );

writeln( "index: ", index );  // 0
}




Re: How to avoid throwing an exceptions for a built-in function?

2017-05-11 Thread crimaniak via Digitalmars-d-learn

On Wednesday, 10 May 2017 at 12:40:41 UTC, k-five wrote:
I have a line of code that uses "to" function in std.conv for a 
purpose like:


int index = to!int( user_apply[ 4 ] ); // string to int

When the user_apply[ 4 ] has value, there is no problem; but 
when it is empty: ""
it throws an ConvException exception and I want to avoid this 
exception.


currently I have to use a dummy catch:
try{
index = to!int( user_apply[ 4 ] );
} catch( ConvException conv_error ){
// nothing
}

I no need to handle that, so is there any way to prevent this 
exception?


try this:
https://dlang.org/phobos/std_exception.html#ifThrown


Re: How to avoid throwing an exceptions for a built-in function?

2017-05-11 Thread k-five via Digitalmars-d-learn
On Wednesday, 10 May 2017 at 21:44:32 UTC, Andrei Alexandrescu 
wrote:

On 5/10/17 3:40 PM, k-five wrote:

---
I no need to handle that, so is there any way to prevent this 
exception?


Use the "parse" family: 
https://dlang.org/phobos/std_conv.html#parse -- Andrei

---

This is my answer :). I want a way to covert a string without 
facing any exceptions.


But may I do not understand so well the documentation
It says:
The parse family of functions works quite like the to family, 
except that:


1 - It only works with character ranges as input.
2 - It takes the input by reference. (This means that rvalues 
- such as string literals - are not accepted: use to instead.)
3 - It advances the input to the position following the 
conversion.
4 - It does not throw if it could not convert the entire 
input.


here, number 4: It does not throw if it could not convert the 
entire input.


then it says:
Throws:
A ConvException if the range does not represent a bool.

Well it says different things about throwing!

Also I tested this:

import std.stdio;
import std.conv: parse;

void main( string[] args ){

string str = "string";
int index = parse!int( str );
writeln( "index: ", index );
}

the output:
std.conv.ConvException@/usr/include/dmd/phobos/std/conv.d(2111): 
Unexpected 's' when converting from type string to type int

and so on ...

Please correct me if I am wrong.


Re: How to avoid throwing an exceptions for a built-in function?

2017-05-11 Thread k-five via Digitalmars-d-learn

On Wednesday, 10 May 2017 at 21:19:21 UTC, Stanislav Blinov wrote:

On Wednesday, 10 May 2017 at 15:35:24 UTC, k-five wrote:

On Wednesday, 10 May 2017 at 14:27:46 UTC, Stanislav Blinov

---
I don't understand. If you don't want to take care of 
exceptions, then you just don't do anything, simply call 
to!int(str).


Well I did that, but when the string is a valid type like: "10" 
there is no problems. But when the string is not valid, like: 
"abc", then to! function throws an exception.


Why I do not want to take care of that? Because I just need the 
value, if the string is valid, otherwise no matter what the value 
of string is.


First I just wrote:
index = to!int( user_apply[ 4 ] );

And this code is a part of a command-line program and the user 
may enter anything. So, for a valid string:

./program '10'   // okey

but for:
./program 'non-numerical' // throws an exception an 10 lines of 
error appear on the screen( console )


I just want to silent this exception. Of course it is useful for 
handling when someone wants to. But in my code I no need to 
handle it. So I want to silent that, without using try{}catch(){} 
block. I just wondered about try-catch and I want to know may 
there would be a better way instead of a dummy try-catch block.


Thanks for replying and mentioning. And I am sorry, since I an 
new in English Writing, if you got confuse.




Re: alias and UDAs

2017-05-11 Thread ag0aep6g via Digitalmars-d-learn

On 05/11/2017 12:39 PM, Andre Pany wrote:

in this example, both asserts fails. Is my assumption right, that UDA on
alias have no effect? If yes, I would like to see a compiler warning.

But anyway, I do not understand why the second assertion fails. Are UDAs
on arrays not allowed?

import std.traits: hasUDA;

enum Flattened;

struct Foo
{
int bar;
}

@Flattened alias FooList = Foo[];

struct Baz
{
FooList fooList1;
@Flattened FooList[] fooList2;
}

void main()
{
Baz baz;
static assert(hasUDA!(baz.fooList1, "Flattened")); // => false
static assert(hasUDA!(baz.fooList2, "Flattened")); // => false
}


1) You have to test against `Flattened`, not `"Flattened"`. A string is 
a valid UDA, but you're not using the string on the declarations.


When you fix this, the second assert passes.

2) `Baz.fooList1` doesn't have any attributes. Attributes apply to 
declarations. If it's valid, the attribute on `FooList` applies only to 
`FooList`. It doesn't transfer to `Baz.fooList1`.


If anything, you could assert that `hasUDA!(FooList, Flattened)` holds. 
Maybe you could, if it compiled.


3) Why does `hasUDA!(FooList, Flattened)` fail to compile?

The error message reads: "template instance hasUDA!(Foo[], Flattened) 
does not match template declaration hasUDA(alias symbol, alias attribute)".


We see that `FooList` has been replaced by `Foo[]`. It's clear then why 
the instantiation fails: `Foo[]` isn't a symbol.


Unfortunately, the spec is a bit muddy on this topic. On the one hand it 
says that "AliasDeclarations create a symbol", but it also says that 
"Aliased types are semantically identical to the types they are aliased 
to" [1].


In practice, the compiler doesn't seem to create a symbol. The alias 
identifier is simply replaced with the aliased thing, and you can't use 
the alias identifier as a symbol.


That means, you might be able to add an attribute to `FooList`, but you 
can't get back to it, because whenever you use `FooList` it's always 
replaced by `Foo[]`. And `Foo[]` doesn't have the attribute, of course.


I agree that it would probably make sense to disallow putting attributes 
on aliases. You can also mark aliases `const`, `static`, `pure`, etc. 
And they all have no effect.



[1] http://dlang.org/spec/declaration.html#AliasDeclaration


Re: alias and UDAs

2017-05-11 Thread Andre Pany via Digitalmars-d-learn

On Thursday, 11 May 2017 at 10:57:22 UTC, Stanislav Blinov wrote:

On Thursday, 11 May 2017 at 10:39:03 UTC, Andre Pany wrote:

[...]


It should've been

alias FooList = @Flattened Foo[];

which will generate a compile-time error (UDAs not allowed for 
alias declarations).


And then:

static assert(hasUDA!(baz.fooList2, Flattened));

No quotes, since Flattened is an enum, not a string


Thanks for the explanation. I think I will create a bug report 
for this statement:

@Flattened alias FooList = Foo[];

The UDA has no effect as far as I understand.

Kind regards
André


Re: alias and UDAs

2017-05-11 Thread Stanislav Blinov via Digitalmars-d-learn

On Thursday, 11 May 2017 at 10:39:03 UTC, Andre Pany wrote:

Hi,

in this example, both asserts fails. Is my assumption right, 
that UDA on alias have no effect? If yes, I would like to see a 
compiler warning.


But anyway, I do not understand why the second assertion fails. 
Are UDAs on arrays not allowed?


import std.traits: hasUDA;

enum Flattened;

struct Foo
{
int bar;
}

@Flattened alias FooList = Foo[];

struct Baz
{
FooList fooList1;
@Flattened FooList[] fooList2;
}

void main()
{   
Baz baz;
static assert(hasUDA!(baz.fooList1, "Flattened")); // => false
static assert(hasUDA!(baz.fooList2, "Flattened")); // => false
}

Kind regards
André


It should've been

alias FooList = @Flattened Foo[];

which will generate a compile-time error (UDAs not allowed for 
alias declarations).


And then:

static assert(hasUDA!(baz.fooList2, Flattened));

No quotes, since Flattened is an enum, not a string


alias and UDAs

2017-05-11 Thread Andre Pany via Digitalmars-d-learn

Hi,

in this example, both asserts fails. Is my assumption right, that 
UDA on alias have no effect? If yes, I would like to see a 
compiler warning.


But anyway, I do not understand why the second assertion fails. 
Are UDAs on arrays not allowed?


import std.traits: hasUDA;

enum Flattened;

struct Foo
{
int bar;
}

@Flattened alias FooList = Foo[];

struct Baz
{
FooList fooList1;
@Flattened FooList[] fooList2;
}

void main()
{   
Baz baz;
static assert(hasUDA!(baz.fooList1, "Flattened")); // => false
static assert(hasUDA!(baz.fooList2, "Flattened")); // => false
}

Kind regards
André


Re: struct File. property size.

2017-05-11 Thread AntonSotov via Digitalmars-d-learn

On Thursday, 11 May 2017 at 08:42:26 UTC, Nicholas Wilson wrote:

Are you in windows perchance? IIRC the when compiling for 32 
bit it doesn't use the 64 bit C file function so that will not 
work.


Yes, windows.  Ok, I understood you.




Re: struct File. property size.

2017-05-11 Thread Nicholas Wilson via Digitalmars-d-learn

On Thursday, 11 May 2017 at 07:24:00 UTC, AntonSotov wrote:

import std.stdio;

int main()
{
auto big = File("bigfile", "r+"); //bigfile size 20 GB
writeln(big.size);  // ERROR!
return 0;
}

//
std.exception.ErrnoException@std\stdio.d(1029): Could not seek 
in file `bigfile` (Invalid argument)


I can not work with a large file?
32 bit executable.


Are you in windows perchance? IIRC the when compiling for 32 bit 
it doesn't use the 64 bit C file function so that will not work.


Re: struct File. property size.

2017-05-11 Thread Stefan Koch via Digitalmars-d-learn

On Thursday, 11 May 2017 at 07:24:00 UTC, AntonSotov wrote:

import std.stdio;

int main()
{
auto big = File("bigfile", "r+"); //bigfile size 20 GB
writeln(big.size);  // ERROR!
return 0;
}

//
std.exception.ErrnoException@std\stdio.d(1029): Could not seek 
in file `bigfile` (Invalid argument)


I can not work with a large file?
32 bit executable.


it seems you cannot :)
files bigger then 4G are still problematic on many platforms.


struct File. property size.

2017-05-11 Thread AntonSotov via Digitalmars-d-learn

import std.stdio;

int main()
{
auto big = File("bigfile", "r+"); //bigfile size 20 GB
writeln(big.size);  // ERROR!
return 0;
}

//
std.exception.ErrnoException@std\stdio.d(1029): Could not seek in 
file `bigfile` (Invalid argument)


I can not work with a large file?
32 bit executable.



Re: Processing a gzipped csv-file by line-by-line

2017-05-11 Thread Jon Degenhardt via Digitalmars-d-learn

On Wednesday, 10 May 2017 at 22:20:52 UTC, Nordlöw wrote:
What's fastest way to on-the-fly-decompress and process a 
gzipped csv-fil line by line?


Is it possible to combine

http://dlang.org/phobos/std_zlib.html

with some stream variant of

File(path).byLineFast

?


I was curious what byLineFast was, I'm guessing it's from here: 
https://github.com/biod/BioD/blob/master/bio/core/utils/bylinefast.d.


I didn't test it, but it appears it may pre-date the speed 
improvements made to std.stdio.byLine perhaps a year and a half 
ago. If so, it might be worth comparing it to the current Phobos 
version, and of course iopipe.


As mentioned in one of the other replies, byLine and variants 
aren't appropriate for CSV with escapes. For that, a real CSV 
parser is needed. As an alternative, run a converter that 
converts from csv to another format.


--Jon