Re: [julia-users] Edit Distance?
On Thu, Jan 16, 2014 at 12:06 AM, Kevin Squire wrote: > > BTW, have anybody ever seen a recursion using Fib, that at the end >> explicitly state that >> we can get the nth term without closed form expression. >> > > Do you mean "with" a closed form expression, such as with Binet's > Formula[1]? > But this closed form won't work in practice, right ? The constraints of floating point arithmetic. > > Kevin > > > [1] http://mathworld.wolfram.com/BinetsFibonacciNumberFormula.html >
Re: [julia-users] Edit Distance?
On Wed, Jan 15, 2014 at 10:13 PM, Matthias BUSSONNIER < bussonniermatth...@gmail.com> wrote: > > Le 14 janv. 2014 à 21:45, Shashwat Anand a écrit : > > > > Assuming you have computed row P from 1 to Q (included) , you don't > need the values > > of row P-1 from 1 to Q (excluded) anymore. So you can use a circular > buffer. > > So you store Q+N-(Q-1) = N+1 values. > > > > > > Awesome ! > > I never thought, we can use circular buffer like this. Quite nifty an > approach it is. > > > > Why is it slow then ? Is it due to % operator ? > > Yes, probably I suppose, you can probably avoid the % operator at each > loop though > as you know in advance that you will get back to 0 index every N+1. > Haven't tried too much though. > > > > > See https://en.wikipedia.org/wiki/Longest_common_subsequence_problem > > Reduce the required space section (they don't explain why exactly but > once you > > see the matrix you'll get it) > > > > > > This one is textbook, use 2 rows and keep on alternating between them. > > A lot of problems like Longest Common Subsequence, Shortest Common > Supersequence etc are > > solved via this. Also this technique is fairly common in programming > competitions. > > Well, you apparently did CS classes, I never did, but get to the same > conclusion :-) > > > > > > I also found that "Dynamic Programing" is an over-used term. > > Crappy internet so can't even search on SO and wikipedia, but IIRC > Dynamic Programming means that you solve a problem by solving smaller > problem. > I have just too often come in contact with response to question like, > "this require Dynamic Programming and it is a highly advance technics > that only senior programmers can handle" for problem as simple that > Fibonacci sequence. > > Those sub problems needs to be overlapping though. When you solve a large problem by solving smaller problems, you are doing divide and conquer. Say merge sort. But if those problems are overlapping, you can convert recursion tree into a dag. Here it a rather fun game theory problem if you want to solve, source [1] There are 2 players A and B. N pots of gold are arranged in a line, each containing some gold coins (the player can see how many gold coins are there in each gold pot - perfect information). They get alternating turns in which the player can pick a pot from one of the ends of the line. The winner is the player which has highest number of gold coins at the end. The objective is to maximize the number of gold coins collected by A, assuming B also plays optimally. A starts the game. > BTW, have anybody ever seen a recursion using Fib, that at the end > explicitly state that > we can get the nth term without closed form expression. > You probably would want to read this. [2] [1] https://www.quora.com/Dynamic-Programming/How-do-you-solve-the-pots-of-gold-game [2] http://www.cs.uiuc.edu/~jeffe/teaching/algorithms/notes/05-dynprog.pdf
Re: [julia-users] Edit Distance?
On Wed, Jan 15, 2014 at 10:49 AM, Matthias BUSSONNIER < bussonniermatth...@gmail.com> wrote: > > > Le 14 janv. 2014 à 21:02, Shashwat Anand a écrit : > > > >> I just found that trying to consume the minimal memory (n+1) was > significantly > >> slower than using (2n) IIRC. > >> > > Just someone curious here. > > > > Assuming two strings of length M and N, we can do the naive edit > distance via dynamic programming, > > in O (MN) time and O (MN) space. > > > > However since only last row is used to determine the result, we can save > the space by using O (N) memory, > > which is basically 2*N memory. > > > > What I don't understand is, what is this minimal memory (n + 1) you just > mentioned ? > > > Assuming you have computed row P from 1 to Q (included) , you don't need > the values > of row P-1 from 1 to Q (excluded) anymore. So you can use a circular > buffer. > So you store Q+N-(Q-1) = N+1 values. > > Awesome ! I never thought, we can use circular buffer like this. Quite nifty an approach it is. Why is it slow then ? Is it due to % operator ? See https://en.wikipedia.org/wiki/Longest_common_subsequence_problem > Reduce the required space section (they don't explain why exactly but once > you > see the matrix you'll get it) > > This one is textbook, use 2 rows and keep on alternating between them. A lot of problems like Longest Common Subsequence, Shortest Common Supersequence etc are solved via this. Also this technique is fairly common in programming competitions. I also found that "Dynamic Programing" is an over-used term. > Why ? I have heard this at some other places too. Is this because it is a really broad category ? For example, a lot of graph theory falls under same area. On an unrelated note, I like writing a recursive code and then simple memoizing it. Seems to me the best way to convert a tree to directed acyclic graph. But cases like these are where we need table, where we need to save space. Or probably we can save space using recursion too, I am not aware of it though ? > -- > M > > > > > >
Re: [julia-users] Edit Distance?
On Wed, Jan 15, 2014 at 10:18 AM, Matthias BUSSONNIER < bussonniermatth...@gmail.com> wrote: > > Le 14 janv. 2014 à 16:57, John Myles White a écrit : > > > Thanks! > > > > I'll have to check that out. I was able to translate some of the > Wikipedia code fast enough to get something working for my purposes. > > > > That was more or less what I did, > > I just found that trying to consume the minimal memory (n+1) was > significantly > slower than using (2n) IIRC. > Just someone curious here. Assuming two strings of length M and N, we can do the naive edit distance via dynamic programming, in O (MN) time and O (MN) space. However since only last row is used to determine the result, we can save the space by using O (N) memory, which is basically 2*N memory. What I don't understand is, what is this minimal memory (n + 1) you just mentioned ? > I tried a little to look into what else could be done in the literature, > but was most of the time you need some assumption on the alphabet, > or the overhead seem too significant for short strings. > > You could also try to mimic how Python difflib and SequenceMatcher works, > it's much faster but does not always give the right result[1]. > > > While you are woking on diff, interested on trying to diff IPython > notebooks ? :-) > -- > M > > > > > [1]: http://carreau.github.io/posts/08-Dear-DiffLib.html > > > > -- John > > > > On Jan 14, 2014, at 3:18 PM, Matthias BUSSONNIER < > bussonniermatth...@gmail.com> wrote: > > > >> > >> Le 14 janv. 2014 à 15:08, John Myles White a écrit : > >> > >>> Is there a package out there to compute edit distances between strings? > >> > >> I started at some point, never really finished. > >> > >> https://github.com/carreau/Diff.jl > >> > >> -- > >> M > >> > >>> > >>> -- John > >>> > >> > > > >
Re: [julia-users] Re: Style Guideline
On Wed, Jan 1, 2014 at 12:35 AM, John Myles White wrote: > (4) Using both tabs and spaces is a huge problem in a shared codebase. > This is probably the only rule in my entire list that I’m actually going to > enforce in the code I maintain. IIRC, Python completely forbids mixing > these kinds of space characters at the language level. > +1 IMHO tabs should totally be avoided. We can configure our editors/IDE to behave tabs like spaces. I know vim does that [set expandtab], and we can use key for all general purpose and it expands to spaces. > > (7) + (8) These rules are part of the official Google style guides for R, > which is the language with the most similarity to Julia that’s being used > at companies with public facing style guidelines. I think they’re quite > sensible rules, which is why I decided to borrow them from published > standards. > > (18) + (19): This is clearly an area of big disagreement in our community. > I might pull them out into a suggestions section since I’d really prefer > that code submitted to things like DataFrames.jl follow this rule, but > don’t want to include a rule that’s going to be a big schism in the > community. > > (22) + (23) + (24): I may take these out as well. I definitely agree that > there’s a big difference between performance guidelines and style > guidelines, although that line is blurry when you’re trying to keep a > codebase written in a consistent style. > > (31): Comments aren’t PDF’s or HTML or any other language designed for > transmitting carefully formatted documents. You don’t get to use images, > properly formatted tables, etc. I find diagrams are an essential part of > good documentation. I think conflating documents with code leads to > documents that are less readable and lots of lines in code that’s not > actually worth reading. > > (35): I might take this one out as well. It’s somewhere on the boundary > between a performance tip and a style habit worth developing. > > — John > > On Dec 31, 2013, at 11:12 AM, Daniel Carrera wrote: > > > Personally, I do not think that a more thorough style guide is > necessarily better. That said, I will give you my comments: > > > > (4): I like tabs and I use them. > > > > (7) + (8): I disagree. Although I generally use comma+space as you say, > at times I deviate from that when I feel that doing so will improve the > clarity and readability of my code. > > > > (18)+(19): I disagree. Although I could favour rules like this in a > particular project, in many cases I think that adding type annotations just > creates syntactic noise and can create a needless limitation. > > > > (22)+(23)+(24): I do not think that performance tips belong in a style > guide. You could spend a lot of time writing performance tips and I don't > see an obvious reason why the three tips you chose are more important than > other performance tips. > > > > (31): I partially disagree. I like writing documentation (e.g. tutorial > or explaining an algorithm) at the top of the file. I like having the > documentation in the same file as the code that it refers to. I do not know > what you mean when you say that "English documents are more readable when > not constrained by the rule of code comments". What rules are those? > > > > Also, I rarely want to have a diagram in my documentation because that > involves starting a WYSIWYG program like LibreOffice or something like > that. I haven't really felt a lot of need for diagrams. > > > > > > (35): This doesn't sound like a style thing either. Advice on the > correct way to use a module, or how to maintain precision or avoid > round-off errors, do not belong in a style guide. This sort of thing > belongs in either the documentation for the module, or on some tutorial > about numerical computation. > > > > Cheers, > > Daniel. > > > > > > > > On Tuesday, 31 December 2013 10:01:23 UTC-5, John Myles White wrote: > > One of the things that I really like about working with the Facebook > codebase is that all of the code was written to comply with a very thorough > internal style guideline. This prevents a lot of useless disagreement about > code stylistics and discourages the creation of unreadable code before > anything reaches the review stage. > > > > In an attempt to emulate that level of thoroughness, I decided to extend > the main Julia manual’s style guide by writing my own personal style > guideline, which can be found at > https://github.com/johnmyleswhite/Style.jl > > > > I’d be really interested to know what others think of these rules and > what they think is missing. Right now, my guidelines leave a lot of wiggle > room. > > > > — John > > > >