Re: Fastest way to count number of lines
How giant is a "giant text file"? On my machine a 75M file takes roughly 0.12 sec to count the lines (it is dummy data, so not very random). If GigaBytes in size, then close enough might be good enough I didn't see @cblake mention it, but you could count the bytes to read 100 lines of a big file and use that to estimate the overall number of lines.
Re: Fastest way to count number of lines
So, one other thing that is _probably_ obvious but bears mentioning just in case it didn't occur to @alfrednewman - the number of addressable/seekable bytes in a file is usually maintained by any modern filesystem on any modern OS as cheaply accessed metadata. So, if what you really need is not an exact count but a "reasonable bound" that could be violated once in a while then you may be able to _really_ accelerate your processing. As one example text corpus that we all have access to, the current Nim git repo's `lib/nim/**.nim` files have an average length of 33 bytes and a standard deviation of about 22 bytes as assessed by this little program: import os, memfiles when isMainModule: var sumSq = 0 for slc in memSlices(inp): inc(counter) sumSq += slc.size * slc.size echo "num lines: ", counter let mean = float(inp.size) / float(counter) echo "mean length: ", mean let meanSq = float(sumSq) / float(counter) echo "var(length): ", meanSq - mean*mean You could probably reasonably bound the average number of bytes per line as, say (average + 4*stdev) which in this case is about 33+22*4 =~ 121 bytes..maybe round up to 128. Then you could do something like: import os var reasonableUpperBoundOnLineCount = int(float(getFileInfo(myPath).size) / float(128)) If you use that bound to allocate something then you are unlikely to over allocate memory by more than about 4X which isn't usually considered "that bad" in this kind of problem space. Depending on what you are doing you can tune that parameter and you might need to be prepared in your code to "spill" past a very, very rare 4 standard deviations tail event. This optimization will beat the pants off any even AVX512 deal that iterates over all the file bytes at least for this aspect of the calculation. It basically eliminates a whole pass over the input data in a case that is made common by construction. Since you have seemed pretty dead set on an exact calculation in other posts, a small elaboration upon the "embedded assumptions" in this optimization may be warranted. All that is really relied upon is that some initial sample of files can predict the distribution of line lengths "well enough" to estimate some threshold (that "128" divisor") that has "tunably rare" spill overs where they are rare enough to not cause much slowdown in whatever ultimate calculation you are actually doing which you have not been clear about. Another idea along these lines, if, say, the files are processed over and over again, is to avoid re-computing all those `memchr()` s by writing a little proc/program to maintain an off-to-the-side file of byte indexes to the beginnings of lines. The idea here would be that you have two sets of files, your actual input files and some paired file "foo.idx" with foo.idx containing just a bunch of binary ints in the native format of the CPU that are either byte offsets or line lengths effectively caching the answer of the `memchr`. If you had such index files then when you want to know how many lines a file is you can `getFileInfo` on the ".idx" file and know immediately. You could be careful and check modification times on the .idx and the original data file and that sort of thing, too. Why, you could even add a "smarter" `memSlices` that checked for such a file and skipped almost all its work and an API call `numSlices` that skipped all the work if the ".idx" file is up-to-date according to time stamps. Basically, except for actually "just computing file statistics", it seems highly unlikely to me that you should really be optimizing the heck out of newline counting in and of itself beyond what `memSlices/memchr()` already do.
Which FUSE library shall I use?
Hello all, After some searching I noticed 2 libraries used for FUSE [https://github.com/zielmicha/reactorfuse](https://github.com/zielmicha/reactorfuse) [https://github.com/akiradeveloper/nim-fuse](https://github.com/akiradeveloper/nim-fuse) I'm starting a FUSE project and was wondering which is best to use with current Nim 17.x?
Re: What's happening with destructors?
> Is the doc on regions now obsolete? It would seem that destructors now > obviate much of the need for regions. It's a bit early to say but I think so, yes. > Is all of that coming, or is it "just an idea" you want feedback on? Destructors, assignment operators, the move optimization are coming behind a `--newruntime` switch and expected to be useful within days/weeks for you to tinker with, how to introduce even more moves ("sink parameters?") is unclear. The other outlined features have no ETA. The really hard part is replacing the existing runtime with one with a different performance profile ("yay, deterministic freeing, yay more efficient multi threading possibilities, ugh, overall slower?!") and is likely stuff for Nim v2. But yeah, feedback is always appreciated.
Re: What's happening with destructors?
@Araq I read the blog post. I have two questions: 1) Is all of that coming, or is it "just an idea" you want feedback on? I learned the new C++ 11 features recently, and most (all?) of those I though were missing in Nim seem to be described in that post. 2) Is there already some rough ETA for "general availability"?
Re: what does macros.quote() do exactly?
Seems to give almost the same output: > type > > > Dumb112023 = ref object of RootObj > contents: int > method frobnicate(this112025: Dumb112023) {.base.} = > echo "frobnicating!" and I still don't understand why Dumb turns into Dumb112023, maybe I will later
Re: Problem using
The error messages keep saying the issue is a mismatch with FlowVar[T]. In Chapter 6 of **Nim in Action** here is what it says they are. `FlowVar[T] can be thought of as a container similar to the Future[T] type, which you used in chapter 3. At first, the container has nothing inside it. When the spawned procedure is executed in a separate thread, it returns a value sometime in the future. When that happens, the returned value is put into the FlowVar container.` Here is updated `segcount` proc segcount(row, Kn: int): uint = var cnt = 0'u for k in 0..
Re: What should d0m96 work on in his next Nim livestream?
I think the most valuable effort is that which is effective at attracting more people to Nim, which is a force multiplier for less inspirational tasks. Everyone knows that things like IRC and other library improvements can be done given someone's time and effort. But many people are judging what Nim is capable of based on its server side web framework, and Nim has been MIA in the most popular web framework benchmarks.
Re: What's happening with destructors?
Thanks @Araq! I'll watch the livestream later. Is the doc on regions now obsolete? It would seem that destructors now obviate much of the need for regions.
Re: Windows installation
https://github.com/dom96/choosenim
Re: Windows installation
Download what it says and run `finish.exe`. Can't get much simpler than that. And the mingw project keeps changing its installer. It varies from good to awful so I prefer to not depend on it.
Re: Problem using
Why do you refuse to try cnt[i] as jlp765 suggests? Do you have an idea how plain cnt += should work? All the parallel calculated results should accumulate in this single variable. Then you may need something to control the access to it.
Re: Problem using
In the previous snippet I forgot the `spawn`. The code below compiles, but is slower. var cnt = 0# count for the primes, the '1' bytes for i in 0..
Re: Problem using
When I use `parallel` I get this compiler error: parallel: var cnt = 0'u # count for the nonprimes, the '1' bytes for i in 0..rescnt-1: # count Kn resgroups along each restrack cnt += spawn segcount(i*KB, Kn) <-- points to start of '(' sync() primecnt += cnt - ssozp5x1c1par.nim(155, 28) Error: 'spawn' must not be discarded
Windows installation
Hi Just tried installing Nim following the directions from here [https://nim-lang.org/install_windows.html](https://nim-lang.org/install_windows.html) You _CAN_ do better than this. For example why are you not linking to [https://sourceforge.net/projects/mingw-w64](https://sourceforge.net/projects/mingw-w64)/ rather than [https://nim-lang.org/download/mingw64-6.3.0.7z](https://nim-lang.org/download/mingw64-6.3.0.7z)
Re: nim-cookbook
Update: Looks like I can't delete the wiki directly! (I think this is for good reason; on a case by case basis - so that eg, rogue admins can't take their community hostage or whatever). Have made an official request to wikia to delete it, opening ticket #349088 on their system. It should hopefully be deleted within 2 business days. Wikia wiki is currently over here, but should hopefully disappear soon: [http://nim-lang.wikia.com/wiki/The_Nim_programming_language_Wiki](http://nim-lang.wikia.com/wiki/The_Nim_programming_language_Wiki)
Re: What should next Araq's live stream be about?
The recording is now available! [https://www.youtube.com/watch?v=KNUDGZuqfQM](https://www.youtube.com/watch?v=KNUDGZuqfQM)
Re: nim-cookbook
Hmm, then I'll put that on hold for now Already created the wikia, but I'll see if I can nuke it.
Re: What should next Araq's live stream be about?
@wizzardx: Glad you like it. You can already see all of my streams, they're all in [this playlist](https://www.youtube.com/watch?v=UQ4RvUlXIDI&index=3&list=PLm-fq5xBdPkrMuVkPWuho7XzszB6kJ2My). And Araq's are in [his channel](https://www.youtube.com/channel/UCAIXKsgiEkRjwlNgduABgmw). We don't really have a formal way to give suggestions on what we should do in our livestreams. Best way is to just tell us on IRC I guess For me, Twitter will also work if that's easier. For IRC you can also use Gitter.
Re: nim-cookbook
IMO wikia is pretty bad because it includes ads and is overall very bloated. We used to have mediawiki (8+ years ago :)) but it attracted too many spammers. * * * As for the cookbook, awesome job! Please add a link to it on the [Nim website](https://github.com/nim-lang/website) (by creating a PR).
Re: What should d0m96 work on in his next Nim livestream?
Thanks for posting this @Tiberium Thanks for all that voted as well.
Re: Fastest way to count number of lines
Please also compare this thread: [https://forum.nim-lang.org/t/1164#18006](https://forum.nim-lang.org/t/1164#18006) I have not yet used SIMD instructions myself in Nim, but there are some hints in the Forum already. For line counting, the different end-of-line marks for Unix/Windows/Mac makes it a bit more complicated unfortunately.
Re: nim-cookbook
@wizzardx, please, go ahead ! I think your initiative (following the idea of the cookbook concept of @nimboolean) is completely valid Anyway, @nimboolean already posted some interesting content at [https://github.com/btbytes/nim-cookbook/](https://github.com/btbytes/nim-cookbook/). So I think we would have to transcribe what's there for the new Wiki Cheers !
Re: Fastest way to count number of lines
Guys, thank you for your help. @Stefan_Salewski, yes speed is an important point for me. I found the link you provided (about SMID) very interesting ... however, I do not know how to do this using Nim. Could you please help? Even to help newbies like me, thought to include the response of this thread in the cookbook wiki being created as per [https://forum.nim-lang.org/t/3259](https://forum.nim-lang.org/t/3259)
Re: nim-cookbook
Ideally: One of the core devs puts a mediawiki instance under the nim-lang.org server. But failing that, Wikia is a very nice place for starting and maintaining community wikis, eg: [http://bleach.wikia.com/wiki/Bleach_Wiki](http://bleach.wikia.com/wiki/Bleach_Wiki) Edit: Should I take initiative and start it myself? I don't want to steal anyone's thunder, or do something that's too controvercial amongst the Nim community
Re: Problem using
Try for i in 0..rescnt-1: cnt[i] = spawn segcount(i*KB, Kn) I think the issue is with `..<`
Re: nim-cookbook
@wizzardx This is nice. Your idea really makes more sense. Count on me to help. How to configure a wiki ? What would be the best link (URL)?
Re: Beginner question about nil access
The problem here is that result is not initialized to "" by default. As stated, this will change in the future.
Re: Beginner question about nil access
Some things I _really_ like here about nim vs many other proglangs. 1\. Nil access actually segfaults, rather than silent or undefined behavior. 2\. You get a really awesome stack trace. I use "not nil" wherever I can, too, but it's kind of a losing battle, since all the other code wants nillable types, so you have to add a lot of converters/checking code/etc, which kinda defeats the purpose
Re: What's happening with destructors?
Ok, blog post is here: [https://nim-lang.org/araq/destructors.html](https://nim-lang.org/araq/destructors.html) I will copy the "spec worthy" parts that have been implemented already into a wiki page.
Re: What should next Araq's live stream be about?
Really really cool idea; I'm planning to watch all the livestreams; at least once they hit youtube . I've subscribed to both Araq's and Dom's channels on there! Is there a suitable place for adding suggestions for future live streaming subjects? Is that best brought up in IRC perhaps? (have never been on there; I'm more of a forum dweller).
Re: nim-cookbook
Any chance we could do this on a regular wiki, rather than needing to make github pull requests? eg like this: [https://www.renpy.org/wiki/renpy/doc/cookbook/Cookbook](https://www.renpy.org/wiki/renpy/doc/cookbook/Cookbook) .
Re: Problem using
Now in your code there is no spawn at all! For parallel processing, you have to ensure that there are no conflicts when parallel tasks are accessing your data, otherwise the compiler may make copies of the data before, which may make it slow. And for parallel processing a good use of the CPU cache is also important -- many parallel processes will give no speed increase when data is always fetched from slow RAM instead of cache.
Re: Problem using
After reading the **Nim in Action** book I got it to compile by placing a `^` before `spawn`, but it makes the program slower. The problem has to do with `segcount` returning a `FlowVar[T]` mismatch. And when I use `parallel:` it won't compile, and shows even more errors. Doing more research. var cnt = 0# count for the primes, the '1' bytes for i in 0..
Re: Fastest way to count number of lines
If speed is really important for you, you may consider SIMD instructions. D. Lemire gave an example for this in his nice blog: [https://lemire.me/blog/2017/02/14/how-fast-can-you-count-lines](https://lemire.me/blog/2017/02/14/how-fast-can-you-count-lines)/
Re: Fastest way to count number of lines
@jlp765 - good catch. I thought of that, too (I actually wrote that `memSlices` stuff), and almost went back and added a note later, but you beat me to it. I still am unaware about relative timings on platforms other than what I personally use and would be interested to hear reports, but on Linux/glibc `memSlices` (or more generally `mmap+memchr` however that is invoked) is always fastest in my tests.
Re: Fastest way to count number of lines
Even faster (avoiding some string allocations) import memfiles for line in memSlices(memfiles.open("foo")): inc(i)
Re: Fastest way to count number of lines
It sounds like you will have many regular files (i.e., random access/seekable inputs as opposed to things like Unix pipes). On Linux with glibc, memfiles.open is probably the fastest approach which uses memchr internally to find line boundaries. E.g. (right from memfiles documentation), import memfiles for line in lines(memfiles.open("foo")): inc(i) Your mileage on this may vary from OS to OS or libc to libc. I have no idea which if any Microsoft/Windows versions have well-optimized libc memchr() implementations.
Fastest way to count number of lines
Hello, Before processing a giant txt file, I need to know in advance how many lines that file has. Since I will have to process multiple files it would be important to perform this line counting operation as quickly as possible. What is the fastest way to know how many lines a txt file has? I am currently using the following: for line in lines "input.txt": inc(i) For some reason I think the code is very simple and should have some way to do it in a faster way...
Re: nim-cookbook
@nimboolean, I will help you out with PRs you very soon... thanks
Re: Beginner question about nil access
You can't add to a Nil. Firstly initialize it. proc formatTodos(list: TodoList): string = result = "" for todo in list.items(): result.add("Todo: " & todo.desc) result.add("\n") (insert Viccini saying: "you fell for one of the classic blunders ") There was talk of making "seq" and other "array like" variables to auto initialise to stop this happening. Not sure if that will make Nim 1.0 or if it will happen at all (or maybe the default "not nil" will be the solution)