Re: [A51] A5_Ilx: hung without any result
Hi Frank, Great to hear that we can crack A5/1 in near real-time. But I had some problems Does the test chain generator produce any output ? This should be the first step to verify if the new kernel is working. Such as: ./a5il_test Initialized CAL CAL Runtime version 1.4.736 Running on 2 GPUs Num threads 1152 Num threads 1152 Switch to single. Switch to single. Switch to double. Switch to double. Completed 4608 chains Completed 4608 chains Completed 6244 chains Completed 7580 chains Completed 4608 chains Completed 7730 chains Completed 6094 chains Completed 4608 chains Completed 7730 chains Completed 6094 chains ... ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
Re: [A51] Improved Kraken - Any news/updates/beta?
Over the last months I have been working on an improved low-latency version of Kraken. This work has been carried out more or less in secret, and has not been announced until now. The point of this version is to determine how quickly a well funded attacker can break A5/1. The code is reasonable well tested, but doesn't have any big performance gains except if you are using SSD disks, or have a slower CPU. What I have done is to write a new GPU A5/1 kernel that can both calculate chains and relieve the CPU from the final search operation. This kernel is slower than the bitsliced kernel when it comes to bulk processing A5/1 chains, but since each GPU thread only calculates 1 or 2 A5/1 instances, the latency (time it takes to complete a single chain) is noticeably lower. On my machine with 2 GPUs I can (re)crack a burst in under 5 seconds (with all look-ups cached in RAM). It is reasonable to assume that with the same code given enough GPU / SSD storage, it should be possible to perform the cracking operation in less than 2 seconds. I have made the code available for evaluation now, but it isn't properly integrated into the mainline (yet) http://traxme.net/a5/a5_ilx/ Steps to build: 1) Untar and build 2) Place A5IlStubs.cpp A5IlStubs.h in the Kraken build folder 3) Delete A5Cpu.so and A5Ati.so from the Kraken directory if present. Copy A5Il.so this location instead. 4) Insert #include A5IlStubs.h prior to both includes of ../a5_cpu/A5CpuStubs.h 5) Add A5IlStubs.cpp to the build command (build.sh) 7) Rebuild Kraken run This procedure replaces the A5Cpu.so library with A5Il.so which contains the new code. The new GPU code uses the bit population count instruction, introduced with DX11 to quickly count the number of bits set in the taps of the 3 LFSRs. This instruction is somtimes jokingly refered to as the canonical NSA instruction, since said agency had a habit of procuring computers that had such a CPU instruction. Also there is a neat assembly level optimization using ulerp4 (The only instruction that has 3 distinct sources) to shave of some cycles. The A5/1 clocking routine is as follows: ; Clock r100 - output to r110 func 11 ushr r101.xyz,r100.xyz,l12.xyz and r102.xyz,r101.xyz,l14.xyz u4lerp r102.w,r102.x,r102.y,r102.z ixor r102.w,r102.w,l1.w ixor r100.w,r102.w,r102.z and r101.xyz,r101.xyz,l1.xyz ixor r102.x,r101.x,r102.w ixor r102.y,r101.y,r102.w ixor r102.z,r101.z,r102.w and r104,r100,l13 icbits r106,r104 and r106.xyz,r106.xyz,r102.xyz ishl r100.xyz,r100.xyz,r102.xyz ior r100.xyz,r100.xyz,r106.xyz and r106.w,r106.w,l1.w ishl r110.x,r110.x,l1.x ior r110.x,r110.x,r106.w ret The library automatically switches between 1 and 2 instances pr thread, depending on load (A debug message is left in place). A further improvement may be to have a version with 4 instances pr thread, but that is partly outside the scope of just making a low-latency version. Summary: A well funded adversary can break A5/1 in near realtime (1-2 seconds) using only publicly available tables and programs. This will be most useful for intercepting frequency hopping voice calls on large and crowded cells. cheers, Frank ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
Re: [A51] Problem with table convertion tool
Hi list, I'm trying to write the tables (.dlt format) to a disk. The whole process goes fine with Behemoth.py, but when I call kraken afterwards, a bunch of them load fine and then for 116.idx I get this: kraken: DeltaLookup.cpp:70: DeltaLookup::DeltaLookup(NcqDevice*, std::string): Assertion `offset-0x7fff' failed. Aborted I re-added the tables with Behemoth.py twice, each time I got this error on different tables. I believe the only thing that changed is the order in which I added the tables with Behemoth.py. Any ideas on what I am doing wrong? For info the arch is i386. I'm positive that the computer hardware is not at fault (no bad ram or anything)... EiN' This indicates that the data in the dlt file is corrupted. Check the file size (should be similar to other files), and file checksum against the torrent SHA-1 or MD5 posted to the list. Frank ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
[A51] Behemoth and the torrent files
It was brought to my attention that Behemoth.py generates a ValueError if you try to run it on the set of files that are distributed by torrent. This is caused by the torrent files having been renamed. The immediate quick fix is to create a set of symbolic links with the just the table id as names.: tables/100.dlt - ../a51_table_100.dlt Behemoth.py will be updated so it can handle this naming variation as well. (soon) You have been warned :-) Frank ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
Re: [A51] A51 Digest, Vol 15, Issue 6
On Wed, 2010-08-11 at 15:48 +0200, moongaboonga moongaboonga wrote: Where can I find hashes for tables? 39GB per table is a lot and should be checked after download. Thank you! MD5 sums have been posted here: http://lists.lists.reflextor.com/pipermail/a51/2010-June/000675.html 284 is missing from this list, so it would be nice to have a checksum for the sake of completeness. (I don't have this table, can someone who does please post it.) Frank ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
[A51] Some small updates.
For those who haven't noticed, Kraken code has found a new home in git, and the latest version can be gotten by: git clone git://git.srlabs.de/kraken.git The latest version can handle multiple requests simultaneously, and these are identified by a unique job number. I have written some sample glue code in python that shows how it is possible to interface with Kraken over the server socket. The relevant snippets look like this: Simple function to read from socket: def ReadLine(s): r = c = s.recv(1) while c!='\n': r = r + c c = s.recv(1) return r And the code for processing a batch of known plaintext looks something like this #Open server connection s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect((127.0.0.1, 9100)) found = False for crk in cracks: pos = crk.find(crack) skip = False if pos0: cmd = crk[pos:] s.send(cmd+\n) while True: r = ReadLine(s) print r if (len(r) 5) and r[:6]==crack : break if (len(r) 5) and r[:6]==Found : find = r.split( ) key = find[1] keypos = find[3] #Another frame will have to be found from this group framenum1 = ??? framenum2 = ??? burst2 = ??? pipe = os.popen(./find_kc %s %s %s %s %s% (key,keypos,framenum1,framenum2,burst2)) result = pipe.read(); print result rv = result.split(\n) for r in rv: if r.find(MATCH)0: print r key = r.split( )[1:9] keystring = string.join(key, ) # This is your key print keystring found = True break if found: break #close connection s.close() I hope this can prove useful. Frank ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
Re: [A51] Tableformat and Kraken
On Thu, 2010-07-29 at 15:10 +0200, Axel Walsleben wrote: Can you give me some IDs/advance parameter that have not yet computed ? Its better i make some, that are not already there. ;) thanks Table IDs 600,610,620...,690 are available. Frank ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
Re: [A51] Tableformat and Kraken
On Thu, 2010-07-29 at 17:19 +0200, Axel Walsleben wrote: my genTable selftest runs only on 1 GPU. This is the Output from Dmesg: [6.441438] fglrx: module license 'Proprietary. (C) 2002 - ATI Technologies, Starnberg, GERMANY' taints kernel. [6.441443] Disabling lock debugging due to kernel taint [6.501099] [fglrx] Maximum main memory to use for locked dma buffers: 3801 MBytes. [6.501153] [fglrx] vendor: 1002 device: 6898 count: 1 [6.501156] [fglrx] vendor: 1002 device: 6898 count: 2 [6.501159] [fglrx] vendor: 1002 device: 6898 count: 3 [6.501521] [fglrx] ioport: bar 4, base 0x9000, size: 0x100 It Seems, that all 3 HD5870 were detected. The Sample Program FindNumDevices tolds me that i only have 1 Device. Does anybody have an Hint for me ? thxs You need to edit GenTable.py - it is clearly marked where you should edit the number of GPUs to use. (I know I could use all by default) While you are at it, take some time to study the auto increment feature. You will have to edit the increment (10) and the cutoff point (690). When these settings are correctly adjusted, you can leave your machine running for weeks, and it will produce 1 table after another without any intervention from your side, just make sure you have enough free disk space. As the tables are completed, you run sort2 on the output data, and you can delete the original files, freeing up space as you go along. (You can also use TableConvert and compress the tables to free up even more space.) cheers, Frank ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
[A51] Kraken, now in server mode
Rather than fighting with torrent creation tools, I spent my time writing a TCP/IP server core and bolted it on to Kraken. This means that you can specify a port when starting Kraken, and the program will the go into server mode, and accept incoming connections on the port. I have set an initial limit of 25 clients but more can be added if needed. Currently crack is the only supported command, but I am thinking to add some table selection logic. The next step is to write a front end that takes bitvectors and framecounts and feeds those to 1 or more servers. The front end should also perform the find_kc check on the results. With this setup it should be easy to set up a distributed Kraken network, reducing cracking times dramatically, without resorting to expensive SSD media. Enjoy! Frank ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
Re: [A51] Reporting in.. (what's there, what's missing and some ideas?)
On Wed, 2010-07-28 at 19:20 +0200, Fabio Pietrosanti (naif) wrote: 1) Airprobe dump the phone call traffic - We know that it require important improvement for demodulation of real signals - We have to see which is the best pratical approach to do it, to detect the call, to follow it and which procedure must be implemented 2) Kraken crack the call a5/1 Kc key (that's the most important piece) 3) Some piece of sw decrypt the a5/1 encrypted dump generated by Airprobe with the Kc cracked by Kraken. There is a intermediate step here which one shouldn't forget. One needs to find and identify known plaintext, which can be different from network to network. So for initial decryption one will gave to find a way to get Kc from ones SIM card, and use that to decrypt and analyze call setup (on own conversations). This item is probably already made, but should be on the list. An alternative may be to use a straight dump from a Nokia phone. Frank ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
Re: [A51] NVIDIA Tesla C1060 GPGPU
On Mon, 2010-07-26 at 18:00 +0100, Cal Leeming [Simplicity Media Ltd] wrote: This might be a silly question but.. If there was a miscalculation due to hardware, would you be able to detect such an occurrence, and re-process that calc again? One of the big advantages with Tesla is the use of registered memory for the GPU. This significantly reduces the risk of a bad memory bit screwing up your calculations, and the hardware can let you know if such an error exists. ATI doesn't offer this feature on any of their boards, and that is pretty much the reason for them being absent from the HPC space, despite having better performance in relation to both price and power. f ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
[A51] Kraken now on GPU
I just finished a major overhaul of the ATI shared library, and Kraken is now able to take advantage of the speed that is offered by the GPU. The work is not finished yet, as I want to further reduce latency by cracking several bursts in parallel, this will better utilize the GPU pipeline. At any rate I have it to the point where disk IO is the bottleneck, and to my dismay I discovered that 2 of my disks were much slower than the others, and a job will only go as fast as the slowest disk allows for. It could be the motherboard / controller, or the disks themselves. I need to investigate. Kraken will only load the ATI shared library if it is present, so A5Ati.so will have to be placed manually in the working directory (same as kraken) and work will automatically offloaded to the GPU. The final keysearch is still done on the CPU, but this is limited to just 1 round of the rainbow table, the GPU handles the rest. As for distribution of the tables, I have gotten several requests for having disks shipped, payed for with PayPal. I am not sure I really trust this company to not freeze the accounts if someone complains about the contents of ext3 preformatted 2TB disks. I should be able to deliver these at cost for ~250 USD, this includes Norwegian sales tax (25%) which I haven't found a way to defray. But perhaps someone on the list could do better than this and set up shop in a country where overall costs are lower? And ideally, quality packaging materials are readily available :-) (We have already suffer 1 unfortunate disk crash) Frank ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
[A51] Finding Kc with Kraken (dotting the i's)
I made a very simple command line interface to Kraken, which has only 1 useful command (crack). Once fired up, you can then try to crack multiple bursts without reloading the tables every time. If you have some bursts that you want to crack such as: 3811417: 011100101011101011101101011010011101011101011011100100101100010111101000110101100101011100 3811424: 11100011000100011110001101010101110001101000100111010001111011101110000001 The first number is the frame COUNT used for mixing into A5/1 - it can be derived from the frame number in the following way: unsigned int fn2count(unsigned int fn) { unsigned int t1 = fn/1326; unsigned int t2 = fn % 26; unsigned int t3 = fn % 51; return (t111)|(t35)|t2; } The second burst can be cracked, and the command to and output from Kraken looks like this: Kraken crack 11100011000100011110001101010101110001101000100111010001111011101110000001 Cracking 11100011000100011110001101010101110001101000100111010001111011101110000001 Found a56290409b507d75 @ 37 Kraken This means a56290409b507d75 is the key that produces the output at postion 37 after 100 clockings. These numbers can then be fed into my latest tool: find_kc. This program will perform the backclocking, reverses the frame count mix, and the key setup mixing (based on some earlier programs that I wrote) - finally it can as an option take a second frame count together with the burst data as input, and use that to eliminate the wrong candidate Kcs from the backclocking. Example: fr...@quant:~/gsm/tmto-svn/tinkering/A5Util$ ./find_kc a56290409b507d75 37 3811424 3811417 011100101011101011101101011010011101011101011011100100101100010111101000110101100101011100 Found potential key (bits: 37) db18a071e4d1f057 - db18a071e4d1f057 Framecount is 3811424 KC(0): 2e 61 10 5e 80 93 5e 1c *** MATCHED *** KC(1): bc 44 48 ed 03 04 02 53 mismatch KC(2): d4 37 41 cf 3d 04 05 a5 mismatch KC(3): da 74 09 51 60 07 7b c7 mismatch KC(4): f3 f7 a8 3b f6 76 e6 5a mismatch The correct Kc is here: 2e 61 10 5e 80 93 5e 1c , and will produce both cipherstreams correctly, as well as all other cipherstreams, and can consequently be used to decrypt the entire call or SMS. (Byte order may have to be changed, depending on your other tools) How many more nails are needed for A5/1s coffin? :-) Frank ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
Re: [A51] Announcing Berlin A5/1 rainbow table set.
On Thu, 2010-06-17 at 11:38 +0200, Sylvain Munaut wrote: 39-40 tables have been computed, in what we have dubbed the Berlin A5/1 rainbow table set. And how do we make use of them ? The tables that currently reside in Norway can be used with the demonstration program found here: http://reflextor.com/trac/a51/browser/tinkering/A5Util/a5faster.cpp This line of code is limited to using a single table at a time for the moment, but new cracking code is being worked on. The copy of the tables found in Berlin have been transmogrified into a format that only works with yet-to-be released tools. The Berlin A5/1 rainbow table set has a ~22% chance of recovering the key from 114 bits of cipher. More tables will possibly be added in a second installment. Frank ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
Re: [A51] old gsm 900 phone GA628
I would suggest that you check out the OsmocomBB project, unfortunatly interception of encrypted data us normally not possible on commodity phone HW. The reason for this is that decryption happens before error correction, and is therefore handled in the DSP at a level that normally is inaccessible to SW. f On Thu, 2010-06-17 at 15:29 +0200, moongaboonga moongaboonga wrote: I have an old Z80 based phone and a lot of data about it, including electrical scheme, firmware, tools for disassemble flash code, memory maps, etc. It is very simple from hardware point of view, but there is a lot code to disassemble. The good thing is if someone can accomplish that he has all what needs to listen traffic, gather gsm algorithms and everything what phone must have on the plate and what can be used in any way, like for this project for example. But this is too much for one man to accomplish. Is there any chance that someone is interested? Or at least opinion is that a job that is good way to dive in details about real gsm, for the price of time it consume? Best regards! ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51 ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
[A51] Announcing Berlin A5/1 rainbow table set.
Great progress has been made in computing ATI tables with 100 extra clockings, we got some unexpected but very welcome help from university students, and at most there were 8 GPUs on the job. 2x5850 + 2x5870 + 2x5970 - allowing us to compute almost 2 TB of tables in around 4 weeks of time. Currently computation has been halted while we evaluate the coverage, look at better compressions schemes, and focus on more efficient ways to perform table lookups - a euphemism for cracking A5/1 i.e. recovering keys from cipherstream. 39-40 tables have been computed, in what we have dubbed the Berlin A5/1 rainbow table set. The Berlin reference I feel is significant, besides the fact that the ideas to create these tables originated there, it has also has been a collection point for assembling the tables. Moreover, Berlin was also a focal point for tensions during the cold war, tensions that in fact dictated the need for creating A5/1 in the first place. In some ways A5/1 was intended as a virtual wall erected by the West towards the East, to safeguard privacy of communication. Fortunately the physical wall that separated East West fell even before A5/1 was fielded in the first GSM networks in 1991. Still A5/1 continued to serve as a relatively effective protective measure for cellular communications. But over time, as computers, FPGAs eventually GPUs grew faster and faster, the once significant defenses of A5/1 started crumbling, and eventually they offered little or no protection. In response to the relentless advance of computing power, key-lengths were increased, but in short order the available arsenal of remedies where exhausted. Despite numerous claims that GSM encryption was at at the end of its useful life, the GSMA kept insisting that the security offered by A5/1 was adequate. Such denials and counterclaims, are obviously counterproductive and even dangerous. I therefore feel privileged to have taken part in this project, where hackers from both former East West have worked together on dismantling the remains of A5/1 - and effectively declared it completely dead and broken. Our hope is that this will bring about a shift towards proper security in cellular communications, and not further compromised solutions like A5/1 was from the outset. The tables that constitute the Berlin A5/1 rainbow table set. given by their IDs are as follows: 100 108 116 124 132 140 148 156 164 172 180 188 196 204 212 220 230 238 250 260 268 276 284 292 324 332 340 348 356 364 372 380 388 396 404 412 420 428 492 500 (284 492 are optional) Due to their size, theses tables are not easily copied over the Internet, so I have decided to resort to physical transfer in making copies available to research etc. This I will do by announcing some of my traveling to the list, and if there are interested receiving parties, I can bring along tables on hard disk(s) for replication. After some initial seeding, I believe there will be enough interest for these tables to make them go viral. In addition, anyone who finds themselves in Oslo, Norway are welcome to request a copy. The first available location to make a copy will be: * Bucharest, Romania, June 24th - July 5th 2010 Other arrangements can also be made, such as swapping preloaded disks for cash (165EUR @cost) at Schiphol airport. regards, Frank A. Stevenson ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
[A51] Are dummy bursts encrypted? (Airprobe)
Another 'quick and easy' question perhaps. The SDCCH has a fixed bit burst rate, but because L3 messages are mostly processed in synchronous manner, there will be idle time when no messages need be sent. The BST is supposed to work at a constant power level, therefore these TDMA frames are filled with dummy bursts. But what happens when ciphering is enabled ? Will the dummy bursts still be all 0 bits, or will they be encrypted ? So far I have only had access to L3 logs that shows there is a time gap between individual messages, but do we know what those gaps contain ? I would imagine that signal processing on the receiving side will be easiest if the dummy bursts are also encrypted, otherwise handling error bits in a a received dummy burst becomes rather complicated. F ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
Re: [A51] Are dummy bursts encrypted? (Airprobe)
On Wed, 2010-06-02 at 12:56 +0200, M vd S wrote: at night you actually see them flying around by the hundreds, in clear text. This pattern, as well as other defined marker patterns, are defined in this spec: http://www.3gpp.org/ftp/Specs/html-info/0502.htm This document specifies that only Normal Burst can occur on an SDCCH, except for the occasional Access Burst during handover. I am not sure how much weight to give to this, since obviously some form of LAPDm filling must be present. The good news, is that page 55 onwards, shows how the frequency hopping sequence is determined :-) F ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
[A51] Some updates
I would like to give a short update on what goes on a little behind the scene. Eagle eyed participants may have noticed some clues in the SVN repository and source tree. Firstly a lot of though and discussions have taken place around MvdS's post to the list about applying extra clockings in the rainbow table round function. And a consensus has more or less been reached, giving merit to the ideas as posted to the list. And I would personally like to thank MvdS for his theoretical contributions to the project. It has however taken quite a bit of time to verify these findings, and tune the table parameters for optimal performance. Karsten has been toiling over this part, and is probably best suited to explain the choices and candidates. Sascha and I have computed various test tables to verify some of the assumptions. Currently I am computing my first production table - and will then have to do some further verifications on the results. At this point I should underline, that the current generation of tables are still good, and will be used for lookup. But the ATI tables are slightly different, and will need some special attention at the sorting and lookup stage. I have 10 completed tables, and it would probably be best if the other ATI tables be sent my way after sorting to be integrated in the lookup process. However chances are that we will do a transition to tables with a new format. I hope this will be done in an orderly manner, even though it is hard to give a time line for how and when this will happen. I have maybe even gone out on a limb here in announcing this somewhat prematurely. But I think we have a duty to deliver timely information, as well as managing expectations. The current proposal entails computing fewer but larger tables, so for the distributed project the workload will have to be sliced vertically as well as horizontally, meaning a fair amount of work may have to be done in that area. So please don't ask when the new client will be ready. The good news is, granted that the new table format delivers as expected, we should be able to achieve the desired total coverage at an accelerated rate. Even so, I would like to remind anyone eager to start computing new tables, pushing forward the time of reaching desired coverage by a few months, means very little if the intercept capability required for using those tables is still lacking regards, Frank ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
Re: [A51] Extreme storage solutions
On Mon, 2010-02-22 at 07:11 +0100, sascha wrote: The access time for 4 usb sticks is already 0.3ms. Their combined size is 64GB. Take 32 of these groups and you get down to 0.01ms (if it scales indeed). What i was able to test was that it scales up to 16 devices. I got 0.08ms access time with this configuration IIRC. Even if you need 2 or even 4 distinct USB host adaptors to connect 128 devices, you still end up faster and cheaper than with SSD. Besides that, those PCIe SSDs are still more expensive than 2.5inch SSDs. And we certainly do not need hundreds of megabytes per second bandwidth. The write performance is also of no concern. As for the optimization of the USB bus, that comes for free. If you start 4 CPU threads and manage to make each thread access the data blocks of a single device, then the access time improvements scale linearly with the number of devices. Nice! It seems that USB mass storage should have us well covered for 5K requests a seconds :-) I did some further experiments on a single stick, and got access times down to 0.4ms when accessing the block device directly. (Which means we do away with the filesystem altogether.) I added the test code to svn: tmto-svn/tinkering/various/speedy.cpp typically you have to do sudo time ./speedy /dev/sdX and get timing figures for 1 random reads. (Just under 4 seconds in my test) F ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
Re: [A51] lookup tool for tables
On Sun, 2010-02-07 at 23:31 -0800, Sylv1 wrote: Hey, can anyone tell me more about the issues for the lookup stage with the new design? I would like to develop a lookup tool for the table on my local computer and would like to no more about what i should take care about. Thanks for the help, Cheers Sylvain --- On Sat, 2/6/10, sascha sascha.kriss...@web.de wrote: From: sascha sascha.kriss...@web.de Subject: Re: [A51] lookup tool for tables To: a51@lists.reflextor.com Date: Saturday, February 6, 2010, 11:18 AM the lookup tool is being worked on. with the latest changes in table layout there have come up some extra issues. and you wouldnt want a lookup tool that crashes regularly. ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51 Lookup tools have already been written. I have committed an early version to SVN : http://reflextor.com/trac/a51/browser/tinkering/A5Util - this code is only for 'educational' use. Lookup on the CPU is much to slow, so a GPU version will be needed. Please note that this tool uses a different compressed format for the tables, and may not be suitable for use with nvidia generated tables. (The unused bits in the starting points are at different locations) On the ATI front I have done some work with bypassing Brook+ runtime layer, and taken direct control of the buffers / stream management by using CAL. I was hoping to see some speed increase by avoiding some data copying, but that wasn't a real problem. Profiling shows that the ATI code leaves the GPU idle 3-7% of the time now. So there isn't much more to gain. What comes next is to write a better inter process communication system for table generation. The python wrapper seems to have problems with handling 8000 messages a second, and processes just dies on me. After this i will have to write a better job management system, so the GPUs can handle smaller work loads. Currently the ATI code needs a full pipeline to operate. Ideally such a system would be able to shift jobs between GPUs so that when there is little work to do, it will coalesce these jobs to a single GPU. At this point I should be ready for writing GPU lookup code. Well this is my time line for GPU lookup. F ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
Re: [A51] 2.5x efficiency increase by removing illegal states
On Sun, 2010-01-10 at 22:41 +0100, sascha wrote: On Sun, Jan 10, 2010 at 09:36:51PM +0100, Frank A. Stevenson wrote: I think it is should be compared with what you get from purely random states, it could be that our round function has a propensity for generating invalid states, and that would be a major flaw in our approach. (which advance value are you using?) F or mail the source code. Or I could just do it myself, the back stepping source is available on http://traxme.net/a5/ . M vd S also mailed me an even faster version of this code. It is kind of a problem that there are lots of bits and pieces of the kitchen sink laying around in sundry places, but not comitted to SVN. Perhaps we should make a tinkering folder in the repository for these things ? ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
Re: [A51] 2.5x efficiency increase by removing illegal states
On Mon, 2010-01-11 at 05:09 +0100, sascha wrote: due to the bitslicing none of those optimizations are applicable. I was expecting a smaller penalty also, due to the fact that i could avoid DRAM access. If there are real increases in efficiency to be had here, we could move away from the bitslicing approach. Non bitsliced code runs at around 180 chains pr second on a single GPU, whereas bitsliced runs at 500. I did notice however that non bitsliced code was running at around the same speed when using tables and some clever multiplication trick to drive the LFSRs. However DX11 has introduced the NSA instruction (bit population count) to the GPU instruction set. Using this to drive the LFSRs would make it faster than before, and in my implementation at least, processing 2 chains pr thread instead of just a single would give another speed increase. (Better GPU utilization, by filling slots that are unused due to data dependencies) A5/1 could be clocked forward with ~32 operands, making theoretic max speed about the same is the bitsliced speeds are in practice. So if efficiency can be increased by removing invalid states, non-bitsliced code may actually take the lead again. f ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
Re: [A51] 2.5x efficiency increase by removing illegal states
On Sat, 2010-01-09 at 19:30 +0100, sascha wrote: Also note that the great majority of values in a table are never looked up but exist only as a link between the state we are interested in and the end value that is looked up in the data base. A false positive that does not pass the backclocking test is a rare case and does not influence the attack time very much. (is this true? how long does it take to to the backclocking?). Still we would need 2 times the storage if we use the old method. I have gotten false positives, that can't bee clocked back during testing of my table lookup code. Because of the very low current success rate, it is hard to give empirical evidence of the probability of such false misses, but I think we should be prepared for 50% false hit rate of this nature. Meaning ~50% of all key states recovered from the tables have no valid predecessor states at generation 100+. This would be in line with overall frequency of valid states in the table, and should not come as no surprise. f ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
Re: [A51] 2.5x efficiency increase by removing illegal states
On Sat, 2010-01-09 at 23:39 +0100, sascha wrote: Another question is whether it is possible to convert an illegal state that produces the correct keystream to a legal state that produces the same keystream. When i generated some chains with a simple increment function generating the start values i got 92% chain merges in a 10M chains table which suggests that those states that produce the same keystream are only a few bit flips apart. I am not sure if a simple bit flip will do it, but perhaps it is possible by applying single or few clockings of some direction to some of the LFSRs, so you get a legal state, but yet produce the same output. Unfortunately when I found my false positives, I had thrown away the original keys, and had no basis to compare the false positive with the correct state. We should keep this in mind when we start testing the lookup code at a bigger scale: to keep the correct keys for reference and helping us understand better what the situation is with the false positives. f ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
[A51] Table generation. Reminder + ATI script update
Please take note that the reflextor.com web site is not up to date with respect to the recommended table size. Both the fornt page, and the table parameter script gives a size of 380M - this should be 270M . Please ensure that tables are generated to the correct size. Also the ATI gen_table.py script has been updated in svn. The updated version fetches table advance parameters automatically, and begins computing a new table automatically upon completion (i.e. size reaches 270M) cheers, Frank ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
[A51] Using multiple ATI cards
I just supplemented my computer with a second 5850 card, and I am now computing 2 tables at once. The instruction for doing this is rather easy. 1) Make a new folder with gen_table.py and A5Brook.so 2) Download new table parameters to a51id.cgi 3) Write export BRT_ADAPTER=1 before running gen_table.py Thus the table will be calculated on the specified GPU (1) Currently my computer is outputting 1000 chains / second. What I really need now is for the python script to automatically fetch new table parameters when a table has been completed (easy to do really, I may comitt this soon). But when I am not doing A51 (after all it is -25 centigrade outside) I also wrote a Mandelbulb renderer. Sample video here (in HD): http://www.youtube.com/watch?v=oe2bzLH65_A Then I had a brainwave: Making a combined Mandelbulb + A5/1 screensaver for the unwashed masses (Windows) should easily get us all the tables we need :-) Happy new year! Frank ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
Re: [A51] CPU implementation (more code)
sascha wrote: Regarding CPU implementations: I have attached a 64 wide SIMD implementation (single instruction 64 bits of data) that does 8 million a5/1 rounds per second on a single core of a 2ghz core2duo. I am not sure whether -O3 uses SSE automagically, so i assume not. I use 64 64bit wide memory locations (so effectively L1 cache) to store a51 registers vertically, and so i can generate a keystream bit for 64 independent a5/1 chains with a single set of instructions: (r118(regs) ^ r221(regs) ^ r322(regs)); so r118(regs) points to a 64bit memory location that stores bit 18 of R1 for 64 A5/1 engines. the performance bottleneck is the shifting of R1,R2,R3, since you have to touch each register. I like this idea (it would be great to make a practical implementation) ... without giving to much thought about it, it seems that the shifting can be avoided by unrolling the loop, and using the 64 registers as a ring-buffer. Hurray: more templates :-) At the end of 64 iterations, you could then logical or the 15 lsb-registers together, and inspect the result for 0 bits, (distinguising points) - The round functions can be held in another set of 64 bits registers/memory holding transposed round functions and the relevant round function(s) can be updated across the registers. Frank ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
Re: [A51] Fast sorting compact tables
Frank A. Stevenson wrote: I wrote a table sorting function, ... Tarball is available for inspection at: http://traxme.net/a5/ (native endianness is assumed when reading the tables) F I have done more research on efficient table storage. My initial suggestion of using 1 million distict files would simply overwhelm a ext2 forametted flash device with inodes, so I had to reduce the number of files to a more manageble 4096. I have updated my tables sorting code accordingly (on the website for now, will add to svn later ) I have also added code that does the lookup into the compressed tables (not finished) The interesting figures from a 8GB Kingston datatraveler with 5MB/sec write speed are: 1000 lookups in a smaller table (40million chains) can be performed in 2.1 seconds from a cold device (newly inserted, with no locally cached inodes) - access speed is not an issue :-) Time to sort a complete table would be limited to the write speed of the device (~15 minutes). It is also interesting to note that the suggested table size should fit in 4GB in compressed format. So an even multiples of tables can easily be stored on devices in commonly available sizes. The big advantage with flash based storage is that it won't bog down your machine with a torrent of disk requests when doing th lookup work. F ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
[A51] ATI work in progress.
Hi, I am new to this list and project. I have for some weeks now been intrigued by the computational powers of the latest ATI cards, and couldn't resist buying a card late last week (ATI Radeon HD 5850). As a first project, I decided to write the A5/1 ATI implementation for this project. I have got the basics up and running, the round xor functions is missing, and some work need to be done on load balancing. I have problems with synchronizing CPU / GPU stalls Windows kernel. Preliminary test runs at 200 chains / second on a single card. (subject to verification) At this time I feel comfortable making a code drop, which can be found at: http://traxme.net/a5/a5br.zip This solution can be build with MS Visual Studio 2008 express. ATI Brook+ SDK v1.4 A test executable is included, but the workload is processed in 2-3 second chunks, and slower cards may have difficulties with this. ( Adjust the loop counter in a5.br ) I don't feel particularly inclined to do the TMTO project integration, I am of the conviction that namespaces more than 2 levels deep will rot the human brain :-p Frank ___ A51 mailing list A51@lists.reflextor.com http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51