Re: Processing a gzipped csv-file by line-by-line
On Friday, 12 May 2017 at 00:18:47 UTC, H. S. Teoh wrote: On Wed, May 10, 2017 at 11:40:08PM +, Jesse Phillips via Digitalmars-d-learn wrote: [...] H.S. Teoh mentioned fastcsv but requires all the data to be in memory. Or you could use std.mmfile. But if it's decompressed data, then it would still need to be small enough to fit in memory. Well, in theory you *could* use an anonymous mapping for std.mmfile as an OS-backed virtual memory buffer to decompress into, but it's questionable whether that's really worth the effort. If you can get the zip to decompress into a range of dchar then std.csv will work with it. It is by far not the fastest, but much speed is lost since it supports input ranges and doesn't specialize on any other range type. I actually spent some time today to look into whether fastcsv can possibly be made to work with general input ranges as long as they support slicing... and immediately ran into the infamous autodecoding issue: strings are not random-access ranges because of autodecoding, so it would require either extensive code surgery to make it work, or ugly hacks to bypass autodecoding. I'm quite tempted to attempt the latter, in fact, but not now since it's getting busier at work and I don't have that much free time to spend on a major refactoring of fastcsv. Alternatively, I could possibly hack together a version of fastcsv that took a range of const(char)[] as input (rather than a single string), so that, in theory, it could handle arbitrarily large input files as long as the caller can provide a range of data blocks, e.g., File.byChunk, or in this particular case, a range of decompressed data blocks from whatever decompressor is used to extract the data. As long as you consume the individual rows without storing references to them indefinitely (don't try to make an array of the entire dataset), fastcsv's optimizations should still work, since unreferenced blocks will eventually get cleaned up by the GC when memory runs low. T I hacked your code to work with std.experimental.allocator. If I remember it was a fair bit faster for my use. Let me know if you would like me to tidy up into a pull request. Thanks for the library. Also - sent you an email. Not sure if you got it. Laeeth
Re: Processing a gzipped csv-file by line-by-line
On Wed, May 10, 2017 at 11:40:08PM +, Jesse Phillips via Digitalmars-d-learn wrote: [...] > H.S. Teoh mentioned fastcsv but requires all the data to be in memory. Or you could use std.mmfile. But if it's decompressed data, then it would still need to be small enough to fit in memory. Well, in theory you *could* use an anonymous mapping for std.mmfile as an OS-backed virtual memory buffer to decompress into, but it's questionable whether that's really worth the effort. > If you can get the zip to decompress into a range of dchar then > std.csv will work with it. It is by far not the fastest, but much > speed is lost since it supports input ranges and doesn't specialize on > any other range type. I actually spent some time today to look into whether fastcsv can possibly be made to work with general input ranges as long as they support slicing... and immediately ran into the infamous autodecoding issue: strings are not random-access ranges because of autodecoding, so it would require either extensive code surgery to make it work, or ugly hacks to bypass autodecoding. I'm quite tempted to attempt the latter, in fact, but not now since it's getting busier at work and I don't have that much free time to spend on a major refactoring of fastcsv. Alternatively, I could possibly hack together a version of fastcsv that took a range of const(char)[] as input (rather than a single string), so that, in theory, it could handle arbitrarily large input files as long as the caller can provide a range of data blocks, e.g., File.byChunk, or in this particular case, a range of decompressed data blocks from whatever decompressor is used to extract the data. As long as you consume the individual rows without storing references to them indefinitely (don't try to make an array of the entire dataset), fastcsv's optimizations should still work, since unreferenced blocks will eventually get cleaned up by the GC when memory runs low. T -- The computer is only a tool. Unfortunately, so is the user. -- Armaphine, K5
Re: Unicode Bidi Brackets in D std library?
On Thursday, May 11, 2017 7:07:30 PM PDT Las via Digitalmars-d-learn wrote: > On Thursday, 11 May 2017 at 19:05:46 UTC, Las wrote: > > On Thursday, 11 May 2017 at 18:59:12 UTC, ag0aep6g wrote: > >> On 05/11/2017 08:27 PM, Las wrote: > >>> I see no way of getting > >>> [these](http://unicode.org/Public/UCD/latest/ucd/BidiBrackets.txt) > >>> properties for unicode code points in the std.uni library. > >>> How do I get > >>> these properties? > >> > >> Looks like it's too new. std.uni references "Unicode v6.2" as > >> the standard it complies with, but that BidiBrackets.txt was > >> "originally created [...] for Unicode 6.3". > > > > That's sad. > > Maybe there's an easy way for me to add it to phobos. > > Nearly ten thousand lines in std.uni, great. Well, Unicode _is_ stupidly complicated. However, also remember that those lines include the unit tests and documentation, so it's not as much code as it might first seem like. - Jonathan M Davis
Re: How to avoid throwing an exceptions for a built-in function?
On Thursday, 11 May 2017 at 18:07:47 UTC, H. S. Teoh wrote: On Thu, May 11, 2017 at 05:55:03PM +, k-five via Digitalmars-d-learn wrote: On Thursday, 11 May 2017 at 17:18:37 UTC, crimaniak wrote: > On Wednesday, 10 May 2017 at 12:40:41 UTC, k-five wrote: - > try this: > https://dlang.org/phobos/std_exception.html#ifThrown Worked. Thanks. import std.stdio; import std.conv: to; import std.exception: ifThrown; void main( string[] args ){ string str = "string"; int index = to!int( str ).ifThrown( 0 ); // if an exception was thrown, it is ignored and then return ( 0 ); writeln( "index: ", index ); // 0 } Keep in mind, though, that you should not do this in an inner loop if you care about performance, as throwing / catching exceptions will incur a performance hit. Outside of inner loops, though, it probably doesn't matter. T This reason is why I sometimes use isNumeric if I have heaps of strings I need to convert, to reduce exceptions. So something like: int index = (str.isNumeric) ? to!int(str).ifThrown(0) : 0; Jordan
Re: Unicode Bidi Brackets in D std library?
On Thursday, 11 May 2017 at 19:05:46 UTC, Las wrote: On Thursday, 11 May 2017 at 18:59:12 UTC, ag0aep6g wrote: On 05/11/2017 08:27 PM, Las wrote: I see no way of getting [these](http://unicode.org/Public/UCD/latest/ucd/BidiBrackets.txt) properties for unicode code points in the std.uni library. How do I get these properties? Looks like it's too new. std.uni references "Unicode v6.2" as the standard it complies with, but that BidiBrackets.txt was "originally created [...] for Unicode 6.3". That's sad. Maybe there's an easy way for me to add it to phobos. Nearly ten thousand lines in std.uni, great.
Re: Unicode Bidi Brackets in D std library?
On Thursday, 11 May 2017 at 18:59:12 UTC, ag0aep6g wrote: On 05/11/2017 08:27 PM, Las wrote: I see no way of getting [these](http://unicode.org/Public/UCD/latest/ucd/BidiBrackets.txt) properties for unicode code points in the std.uni library. How do I get these properties? Looks like it's too new. std.uni references "Unicode v6.2" as the standard it complies with, but that BidiBrackets.txt was "originally created [...] for Unicode 6.3". That's sad. Maybe there's an easy way for me to add it to phobos.
Re: Unicode Bidi Brackets in D std library?
On 05/11/2017 08:27 PM, Las wrote: I see no way of getting [these](http://unicode.org/Public/UCD/latest/ucd/BidiBrackets.txt) properties for unicode code points in the std.uni library. How do I get these properties? Looks like it's too new. std.uni references "Unicode v6.2" as the standard it complies with, but that BidiBrackets.txt was "originally created [...] for Unicode 6.3".
Re: Lookahead in unittest
On 2017-05-10 18:17, Stefan Koch wrote: It looks like this unitest-test block are treated like a function. unittest blocks are lowered to functions. -- /Jacob Carlborg
Unicode Bidi Brackets in D std library?
I see no way of getting [these](http://unicode.org/Public/UCD/latest/ucd/BidiBrackets.txt) properties for unicode code points in the std.uni library. How do I get these properties?
Re: How to avoid throwing an exceptions for a built-in function?
On Thu, May 11, 2017 at 05:55:03PM +, k-five via Digitalmars-d-learn wrote: > On Thursday, 11 May 2017 at 17:18:37 UTC, crimaniak wrote: > > On Wednesday, 10 May 2017 at 12:40:41 UTC, k-five wrote: > - > > try this: > > https://dlang.org/phobos/std_exception.html#ifThrown > > > > Worked. Thanks. > > import std.stdio; > import std.conv: to; > import std.exception: ifThrown; > > void main( string[] args ){ > > string str = "string"; > int index = to!int( str ).ifThrown( 0 ); // if an exception was thrown, > it > is ignored and then return ( 0 ); > writeln( "index: ", index );// 0 > } Keep in mind, though, that you should not do this in an inner loop if you care about performance, as throwing / catching exceptions will incur a performance hit. Outside of inner loops, though, it probably doesn't matter. T -- Ph.D. = Permanent head Damage
Re: How to avoid throwing an exceptions for a built-in function?
On Thursday, 11 May 2017 at 17:18:37 UTC, crimaniak wrote: On Wednesday, 10 May 2017 at 12:40:41 UTC, k-five wrote: - try this: https://dlang.org/phobos/std_exception.html#ifThrown Worked. Thanks. import std.stdio; import std.conv: to; import std.exception: ifThrown; void main( string[] args ){ string str = "string"; int index = to!int( str ).ifThrown( 0 ); // if an exception was thrown, it is ignored and then return ( 0 ); writeln( "index: ", index ); // 0 }
Re: How to avoid throwing an exceptions for a built-in function?
On Wednesday, 10 May 2017 at 12:40:41 UTC, k-five wrote: I have a line of code that uses "to" function in std.conv for a purpose like: int index = to!int( user_apply[ 4 ] ); // string to int When the user_apply[ 4 ] has value, there is no problem; but when it is empty: "" it throws an ConvException exception and I want to avoid this exception. currently I have to use a dummy catch: try{ index = to!int( user_apply[ 4 ] ); } catch( ConvException conv_error ){ // nothing } I no need to handle that, so is there any way to prevent this exception? try this: https://dlang.org/phobos/std_exception.html#ifThrown
Re: How to avoid throwing an exceptions for a built-in function?
On Wednesday, 10 May 2017 at 21:44:32 UTC, Andrei Alexandrescu wrote: On 5/10/17 3:40 PM, k-five wrote: --- I no need to handle that, so is there any way to prevent this exception? Use the "parse" family: https://dlang.org/phobos/std_conv.html#parse -- Andrei --- This is my answer :). I want a way to covert a string without facing any exceptions. But may I do not understand so well the documentation It says: The parse family of functions works quite like the to family, except that: 1 - It only works with character ranges as input. 2 - It takes the input by reference. (This means that rvalues - such as string literals - are not accepted: use to instead.) 3 - It advances the input to the position following the conversion. 4 - It does not throw if it could not convert the entire input. here, number 4: It does not throw if it could not convert the entire input. then it says: Throws: A ConvException if the range does not represent a bool. Well it says different things about throwing! Also I tested this: import std.stdio; import std.conv: parse; void main( string[] args ){ string str = "string"; int index = parse!int( str ); writeln( "index: ", index ); } the output: std.conv.ConvException@/usr/include/dmd/phobos/std/conv.d(2111): Unexpected 's' when converting from type string to type int and so on ... Please correct me if I am wrong.
Re: How to avoid throwing an exceptions for a built-in function?
On Wednesday, 10 May 2017 at 21:19:21 UTC, Stanislav Blinov wrote: On Wednesday, 10 May 2017 at 15:35:24 UTC, k-five wrote: On Wednesday, 10 May 2017 at 14:27:46 UTC, Stanislav Blinov --- I don't understand. If you don't want to take care of exceptions, then you just don't do anything, simply call to!int(str). Well I did that, but when the string is a valid type like: "10" there is no problems. But when the string is not valid, like: "abc", then to! function throws an exception. Why I do not want to take care of that? Because I just need the value, if the string is valid, otherwise no matter what the value of string is. First I just wrote: index = to!int( user_apply[ 4 ] ); And this code is a part of a command-line program and the user may enter anything. So, for a valid string: ./program '10' // okey but for: ./program 'non-numerical' // throws an exception an 10 lines of error appear on the screen( console ) I just want to silent this exception. Of course it is useful for handling when someone wants to. But in my code I no need to handle it. So I want to silent that, without using try{}catch(){} block. I just wondered about try-catch and I want to know may there would be a better way instead of a dummy try-catch block. Thanks for replying and mentioning. And I am sorry, since I an new in English Writing, if you got confuse.
Re: alias and UDAs
On 05/11/2017 12:39 PM, Andre Pany wrote: in this example, both asserts fails. Is my assumption right, that UDA on alias have no effect? If yes, I would like to see a compiler warning. But anyway, I do not understand why the second assertion fails. Are UDAs on arrays not allowed? import std.traits: hasUDA; enum Flattened; struct Foo { int bar; } @Flattened alias FooList = Foo[]; struct Baz { FooList fooList1; @Flattened FooList[] fooList2; } void main() { Baz baz; static assert(hasUDA!(baz.fooList1, "Flattened")); // => false static assert(hasUDA!(baz.fooList2, "Flattened")); // => false } 1) You have to test against `Flattened`, not `"Flattened"`. A string is a valid UDA, but you're not using the string on the declarations. When you fix this, the second assert passes. 2) `Baz.fooList1` doesn't have any attributes. Attributes apply to declarations. If it's valid, the attribute on `FooList` applies only to `FooList`. It doesn't transfer to `Baz.fooList1`. If anything, you could assert that `hasUDA!(FooList, Flattened)` holds. Maybe you could, if it compiled. 3) Why does `hasUDA!(FooList, Flattened)` fail to compile? The error message reads: "template instance hasUDA!(Foo[], Flattened) does not match template declaration hasUDA(alias symbol, alias attribute)". We see that `FooList` has been replaced by `Foo[]`. It's clear then why the instantiation fails: `Foo[]` isn't a symbol. Unfortunately, the spec is a bit muddy on this topic. On the one hand it says that "AliasDeclarations create a symbol", but it also says that "Aliased types are semantically identical to the types they are aliased to" [1]. In practice, the compiler doesn't seem to create a symbol. The alias identifier is simply replaced with the aliased thing, and you can't use the alias identifier as a symbol. That means, you might be able to add an attribute to `FooList`, but you can't get back to it, because whenever you use `FooList` it's always replaced by `Foo[]`. And `Foo[]` doesn't have the attribute, of course. I agree that it would probably make sense to disallow putting attributes on aliases. You can also mark aliases `const`, `static`, `pure`, etc. And they all have no effect. [1] http://dlang.org/spec/declaration.html#AliasDeclaration
Re: alias and UDAs
On Thursday, 11 May 2017 at 10:57:22 UTC, Stanislav Blinov wrote: On Thursday, 11 May 2017 at 10:39:03 UTC, Andre Pany wrote: [...] It should've been alias FooList = @Flattened Foo[]; which will generate a compile-time error (UDAs not allowed for alias declarations). And then: static assert(hasUDA!(baz.fooList2, Flattened)); No quotes, since Flattened is an enum, not a string Thanks for the explanation. I think I will create a bug report for this statement: @Flattened alias FooList = Foo[]; The UDA has no effect as far as I understand. Kind regards André
Re: alias and UDAs
On Thursday, 11 May 2017 at 10:39:03 UTC, Andre Pany wrote: Hi, in this example, both asserts fails. Is my assumption right, that UDA on alias have no effect? If yes, I would like to see a compiler warning. But anyway, I do not understand why the second assertion fails. Are UDAs on arrays not allowed? import std.traits: hasUDA; enum Flattened; struct Foo { int bar; } @Flattened alias FooList = Foo[]; struct Baz { FooList fooList1; @Flattened FooList[] fooList2; } void main() { Baz baz; static assert(hasUDA!(baz.fooList1, "Flattened")); // => false static assert(hasUDA!(baz.fooList2, "Flattened")); // => false } Kind regards André It should've been alias FooList = @Flattened Foo[]; which will generate a compile-time error (UDAs not allowed for alias declarations). And then: static assert(hasUDA!(baz.fooList2, Flattened)); No quotes, since Flattened is an enum, not a string
alias and UDAs
Hi, in this example, both asserts fails. Is my assumption right, that UDA on alias have no effect? If yes, I would like to see a compiler warning. But anyway, I do not understand why the second assertion fails. Are UDAs on arrays not allowed? import std.traits: hasUDA; enum Flattened; struct Foo { int bar; } @Flattened alias FooList = Foo[]; struct Baz { FooList fooList1; @Flattened FooList[] fooList2; } void main() { Baz baz; static assert(hasUDA!(baz.fooList1, "Flattened")); // => false static assert(hasUDA!(baz.fooList2, "Flattened")); // => false } Kind regards André
Re: struct File. property size.
On Thursday, 11 May 2017 at 08:42:26 UTC, Nicholas Wilson wrote: Are you in windows perchance? IIRC the when compiling for 32 bit it doesn't use the 64 bit C file function so that will not work. Yes, windows. Ok, I understood you.
Re: struct File. property size.
On Thursday, 11 May 2017 at 07:24:00 UTC, AntonSotov wrote: import std.stdio; int main() { auto big = File("bigfile", "r+"); //bigfile size 20 GB writeln(big.size); // ERROR! return 0; } // std.exception.ErrnoException@std\stdio.d(1029): Could not seek in file `bigfile` (Invalid argument) I can not work with a large file? 32 bit executable. Are you in windows perchance? IIRC the when compiling for 32 bit it doesn't use the 64 bit C file function so that will not work.
Re: struct File. property size.
On Thursday, 11 May 2017 at 07:24:00 UTC, AntonSotov wrote: import std.stdio; int main() { auto big = File("bigfile", "r+"); //bigfile size 20 GB writeln(big.size); // ERROR! return 0; } // std.exception.ErrnoException@std\stdio.d(1029): Could not seek in file `bigfile` (Invalid argument) I can not work with a large file? 32 bit executable. it seems you cannot :) files bigger then 4G are still problematic on many platforms.
struct File. property size.
import std.stdio; int main() { auto big = File("bigfile", "r+"); //bigfile size 20 GB writeln(big.size); // ERROR! return 0; } // std.exception.ErrnoException@std\stdio.d(1029): Could not seek in file `bigfile` (Invalid argument) I can not work with a large file? 32 bit executable.
Re: Processing a gzipped csv-file by line-by-line
On Wednesday, 10 May 2017 at 22:20:52 UTC, Nordlöw wrote: What's fastest way to on-the-fly-decompress and process a gzipped csv-fil line by line? Is it possible to combine http://dlang.org/phobos/std_zlib.html with some stream variant of File(path).byLineFast ? I was curious what byLineFast was, I'm guessing it's from here: https://github.com/biod/BioD/blob/master/bio/core/utils/bylinefast.d. I didn't test it, but it appears it may pre-date the speed improvements made to std.stdio.byLine perhaps a year and a half ago. If so, it might be worth comparing it to the current Phobos version, and of course iopipe. As mentioned in one of the other replies, byLine and variants aren't appropriate for CSV with escapes. For that, a real CSV parser is needed. As an alternative, run a converter that converts from csv to another format. --Jon